This blog examines how we tackled the challenge of safely adopting generative AI to simplify complex coding tasks and meet faster delivery demands.
Generative AI offers a lot of opportunities for productivity gains in the software development lifecycle, as our discipline deals with programming languages, frameworks, and tools that require communication and understanding of both natural and programming languages.
By organizing an Uber Tech-wide Hackdayz event, we gained valuable insights into the strengths and weaknesses of generative AI in software development. This exercise helped us learn about challenges and opportunities generative AI posed.
Uber teams around the globe used this event to build generative-AI-powered proofs of concept to automate coding, generate tests, improve code quality, and reduce operational load. The event allowed us to identify and prioritize areas where generative AI could potentially be used to measurably boost productivity and spur innovation. To ensure the safety and security of the tools and data used in the Uber Tech-wide Hackdayz, we partnered with a number of cross-functional teams, including legal, technical privacy, security, and open source teams. We also created guidelines for the tools and data that were used in the hackathon.
How can generative AI impact developer productivity?
At Uber and in the software engineering industry as a whole, developers face growing challenges: increasing complexity of software systems, growing demand for faster delivery, and managing diverse tasks like code development, testing, and maintenance–all while ensuring quality and adhering to best practices.
Recent advances in generative AI and large language models have demonstrated advanced competencies in natural language understanding.
Generative AI offers opportunities to automate and optimize significant aspects of the development process to varying degrees, easing these challenges and boosting developer productivity.
The ability to understand and generate sophisticated text, analyze large datasets, identify patterns, and make predictions makes generative AI a perfect application for software development.
“I think a lot of people obviously want to talk about the sexy kind of new consumer applications. I would tell you that I think that the earliest and most significant effect that AI is going to have on our company is actually going to be as it relates to our developer productivity. Some of the tools that we’re seeing are going to allow our devs to kind of be super devs and to be able to innovate more, build more faster, and that will essentially leverage and accelerate innovation across the platform.”
CEO of Uber
Remarks are from Uber’s Q1’23 earnings call
Generative AI can automate simple tasks
By automating simpler, tedious tasks (generating boilerplate code, fixing linter errors, generating unit tests, etc.), generative AI can help engineers focus on more complex tasks.
Generative AI can improve quality & reliability
Since generative AI models are trained on large codebases, they have the potential to provide intelligent suggestions and recommendations based on the existing codebases. Generative AI can analyze the code semantics and could improve the reliability of software by identifying potential bugs, vulnerabilities, or performance issues early in the development cycle.
Generative AI has the potential to improve communication
With its natural language comprehension, generative AI also has the potential to enhance the communication of requirements, by better facilitating collaboration between stakeholders and minimizing misinterpretation.
Generative AI could allow for faster prototyping which might lead to quicker validation of ideas
Finally generative AI can more quickly than humans generate prototypes and working code snippets based on high-level specifications, thus enabling developers to validate ideas and fail fast without committing a lot of development time and resources that human development trial and error otherwise would.
Organizing an Uber tech-wide hackathon to unleash innovation
To explore the potential of generative AI, we recently organized an “Uber Tech-Wide Hackdayz”. This event was distributed across all of Uber’s sites, and brought together interdisciplinary teams from various departments (software engineers, TPMs, PMs, data analysts, QA engineers).
Overall, the Uber Tech-wide HackDayz was a huge success. 713 engineers around the globe participated and submitted 98 impressive working demos across 3 categories: Product Experience, Developer Productivity, and Business Operations.
The quality and depth of the submitted projects were inspiring and showed the spirit of the company, teaching us ways we can move faster, be creative, take risks, and do more with less.
Applying GenAI to Uber’s Software Development Lifecycle
Based on the demos, we found that all phases of the software development lifecycle have the potential to be improved by generative AI.
Generative AI’s Potential to Improve Developer Productivity
Here are some of the interesting ways generative AI helps, as demonstrated by the prototypes we built during the three-day HackDayz activity:
Design document creation, review, and risk assessment
Creating requirements & design documents
Although there are many operational requirements that may be common across multiple services and applications, requirements for a new feature may still be incomplete. Generative AI can assist product managers and engineers in capturing complete specifications faster.
Code & test generation
Generative AI can generate new text, which can be helpful with the right prompt. This makes it a great fit for improving code by suggesting new features or fixing bugs. It can also be used to generate code that is more efficient or easier to read.
The diverse set of innovative HackDayz projects in the code and test generation space can be summarized into the following areas:
Explain existing code
Engineers spend a lot of time reading code every day. It can take months or even years to deprecate older codebases because it is difficult to understand legacy code. Questions about context in other codebases take time and effort to answer through support channels. Furthermore, it can take months or years to build deep domain knowledge in some codebases. Generative AI can be used to explain existing codebases and significantly reduce the time it takes for engineers to be effective on unfamiliar code.
Generating UI code
One of the pragmatic applications of generative AI was a tool that allows users to create user interfaces by describing them in natural language. This tool generates a code snippet using Uber’s BaseWeb component library, which can be easily copied and pasted into any Uber web application.
This generated code can also be tested and tailored iteratively within the tool itself, allowing users to make changes and interact with the UI directly. The tool has a memory feature that allows users to build on previous queries and ask for improvements or modifications to the existing code snippet, such as adding or removing components.
Automated refactoring & centralized migrations
Generative AI can automate code refactoring, which is time-consuming and error-prone. This could improve codebases and make it easier to upgrade dependencies or write new code. Open source code and existing tools can be used to achieve this.
Generative AI can help identify migration patterns and automate the refactoring of large codebases. Platform teams spend a lot of time on manual centralized migrations. For example, the Java platform team spent significant effort on migrating the Java framework over the last two years. Automating this process could have saved them a lot of engineering time and allowed them to focus on other business-critical work.
Review & improve existing code
Rule-based approaches for static analysis have been around for a while, and they are widely used to provide developers with syntactic corrections and suggestions. We see that generative AI has the potential to take this step to the next level by recommending improvements and automating the follow-up actions.
Traditional static analysis tools, such as Errorprone, are effective for early detection of bugs in Java code. Such tools often flag bugs that are difficult to fix and can impact software reliability.
Current automatic program repair focuses on standard benchmarks, and neglects evaluation on real production code. One of the hackathon projects introduced a novel approach to fix static analysis bugs at scale in an industrial context using generative AI.
Generating unit tests is one of the tasks where developers shortcut the most. The ability to generate test cases with ease would increase overall test effectiveness and coverage, and consequently system quality.
One of the most interesting HackDayz projects is the automatic generation of end-to-end tests for iOS and Android apps. The project team used large language models (LLMs) and mobile app images to automatically generate a test script, eliminating the need to write or update any code. The prototype first extracts the sequence of actions required to navigate between two screens specified by the user. It then inputs the action sequence into the LLM to generate the precise code for running the automated tests. Auto-generating end-to-end tests can help improve testing efficiency, reduce costs, and prevent outages.
Another innovative project was a debugging tool that used generative AI to create call stacks to provide a clear understanding of the code execution flow. The tool generates all possible call stacks leading to a selected line of code and provides a graphical visualization of the code map within IDE to help developers gain a deeper understanding of their codebase. This approach reduces the time spent on debugging, enhances code understanding and maintainability, and leads to faster resolution of issues and bugs.
Automating code reviews is one of the more impactful areas to apply generative AI. On average our developers spend ~60 minutes per PR in addressing review comments. Several HackDazy projects automated the code review process by leveraging generative AI to resolve review comments, and save developers time and effort.
It is critical for engineering velocity to understand the reasons for CI failures and respond to them quickly. CI build logs frequently include infrastructure-specific records as well as user-specific build output and errors. Understanding the actual failure often requires scrolling through the build output and navigating to specific error details. Furthermore, CI failures that are specific to user errors, such as programmatic errors in a source code, should be quickly exposed to the user, while infrastructure errors are frequently resolved through automatic retries. One of the HackDayz projects used the build logs to automatically categorize these failures and take the next set of actions to speed up the process.
Currently, it can take over a week to localize content using human translators. We believe that using generative AI can speed up the process while improving accuracy and efficiency. One hackathon project used generative AI to translate content into multiple languages and then verified the translations with human agents. This project proved that generative AI can be used to enhance the efficiency, accuracy, and speed of localization.
There were several projects to improve the collaboration and reduce the support load for our developers.
One of the project teams built a tool that can automate level-1 support queries in Slack channels. They built a Slack bot that can answer support questions and retrieve real-time information from the associated services. Developers interacting with the bot would get instant responses rather than waiting for support from the team members in different time zones.
The lack of meaningful documentation in large codebases can be supplemented by auto-generated documentation, making it easier for engineers to onboard and understand new libraries or projects.
Another impactful project focusing on the knowledge base space had the ambitious goal of revolutionizing software documentation by using generative AI. This project team built a prototype tool to automatically generate up-to-date documentation based on PRs, capturing dependencies, and providing high-level overviews. The approach involves using generative AI to analyze and create summaries for each document, recursively summarizing them at higher levels for the entire project. This tool attempts to streamline documentation, thus empowering developers to focus on impactful work while ensuring accurate documentation that reflects the current codebase.
Risks when using Generative AI
Quality of the output generated
From our experience LLMs may generate buggy code, as well as spreading error-prone code patterns. LLMs also need a lot of data to learn how to use language correctly. If this data is biased or flawed, the LLMs can replicate these problems and give wrong or unfair suggestions. This means that the quality and accuracy of the data used to train LLMs is very important, as mistakes can get repeated and cause further issues. For example, LLMs used to help with coding can also carry over security problems from the training data. So, it’s crucial to be cautious with the data used for training LLMs.
Explainability of the decisions
It is important to be able to understand and explain the reasoning behind the AI-generated output, especially when these outputs are used in business-critical settings. Not being able to trace the AI-generated code to its source can create trust issues.
Integrating generative AI into our development processes
As we thought about addressing some of the challenges we discussed in the previous section, we decided to build a single API gateway designed to provide guardrails, help address PII redaction, detect hallucination, rate-limit, load balance, and capture audit logs.
Uber Generative AI API Gateway is the unified gateway for LLM access at Uber. It provides one standard API for external and internal LLMs. Uber Generative AI API Gateway provides assistance with PII redaction, safety and Uber policy guardrails, hallucination detection, and other common platform functionalities that are essential for safely and efficiently using generative AI at Uber.
By organizing the Uber Tech-wide HackDayz, our engineers got hands-on experience and training with generative AI. Uber employees collaborated with their peers across time zones and disciplines for this event. It encouraged employees to think creatively and explore ground-breaking ideas of how generative AI could be used to create business impact at Uber. As a result, HackDayz resulted in a large number of high-quality and practical ideas that we’ll explore and invest in further this year.
We look forward to applying our learnings and insights in ways that may boost developer productivity, create new customer experiences, and improve Uber’s business in new ways.
Ali-Reza Adl-Tabatabai leads the Developer Platform at Uber. His team builds the frameworks and tools that empower Uber developers to Build with Heart. His vision is for Uber to lead the industry in how developers build, deploy, and manage high-quality software productively and at scale.
Serdar Badem is the Product Manager Lead for the Developer Platform at Uber. His focus is to measure and identify the developers pain points, prioritize and accelerate key projects to improve developer effectiveness across Uber. He drives several initiatives across the platform to reduce friction and improve developer velocity.
Anshu Chadha leads the Service Platform and Developer Infrastructure organization which has the mission to enable Uber developers to create, build, and land high quality services consistently and without frustration.
Adam Huda is the Sr. Engineering Manager of the Mobile Foundations and Developer Experience team. His mission is to empower Uber’s mobile developer community with frameworks and tools to quickly iterate on features and ship reliable apps at scale.
Brandon Lico is the Engineering Manager of the Golang Monorepo and Developer Experience team. His mission is to build frustration-free tooling for Go Developers to develop/debug/test software in the Monorepo environment.
Risk Entity Watch – Using Anomaly Detection to Fight Fraud
September 28 / Global
Innovative Recommendation Applications Using Two Tower Embeddings at Uber
July 26 / Global
ML Education at Uber: Program Design and Outcomes
August 2, 2022 / Global
Accelerating Advertising Optimization: Unleashing the Power of Ads Simulation
Use Passkeys Wherever You Sign in to Uber
Real-Time Analytics for Mobile App Crashes using Apache Pinot
Our Journey Adopting SPIFFE/SPIRE at Scale