In 2014, Uber began expanding ever rapidly. Our platform grew from about 60 cities to 100 in the spring, and then to 200 in the fall. Meanwhile, our fastest growing cities were among our oldest.
As the number of additional platform engineers grew, so did the disorganization of deploying new code. Each team used its own custom shell scripts to shepherd new versions of its microservices into production, manually monitoring them with service-specific tools. When upgrading hosts went awry, engineers tediously rolled back one machine at a time. With more and more engineers working on Uber services, this manual labor couldn’t scale and sometimes prolonged outages.
How did we learn to consistently deploy every day? We developed Micro Deploy (known as μDeploy for short), our in-house deployment system that builds, upgrades, and rolls back services at Uber.
The Daily Deployment Process
Uber engineers use Micro Deploy once their code is production-ready—that is, once it’s reviewed, accepted, passing all unit tests, and merged into the repository. First, the engineer selects a service to upgrade in the μDeploy interface. To start an upgrade workflow, they select a deployment and refer to a version of the source code in the Git repository.
Behind the scenes, μDeploy builds the service as needed, distributes the build, talks to the relevant masters in the right data centers, and has each agent update the service on the hosts marked for deployment. Throughout the process, the μDeploy user interface gives visual feedback about the state of the rollout until the workflow completes so that the engineer can move on to another task.
In this way, μDeploy builds and rolls out most services within a few minutes. This is also how quickly an engineer can have an impact.
The interim between an engineer writing code to its going live in Uber’s production systems is incredibly short. Uber’s growth has not slowed since we rolled out the initial version of μDeploy. Each week in 2016, thousands of engineers push to prod several thousand service builds, 10% of which μDeploy rejects after monitoring, rolling them back to the previous version. This means that some part of the Uber system starts upgrading every single minute during working hours. Since updates typically take more than a minute, the system is always on its way to a new version.
Our Mission: Deploy with Confidence
Micro Deploy itself consists of many microservices, most of which deploy with μDeploy:
- A web application UI plus CLI lets engineers choose how to interact with μDeploy.
- μDeploy agents run on each machine in our data centers. The agent installs and reconfigures services when instructed by its μDeploy master. The agent also reports the machine status back to master with a full overview of each service.
- μDeploy masters control how the μDeploy agents behave on all the machines in a data center. Each data center has at least one master.
- The μDeploy aggregator interfaces with a master in each data center to manage deployments throughout.
- A system we call uBuild builds services before a rollout in a single cluster of uBuild machines and then distributes them to all data centers.
- μDeploy replicators copy final builds within and between data centers.
- μDeploy orchestrators manage rollout workflows in a distributed and fault tolerant manner.
- μDeploy placement locates a set of host machines for deploying a service.
- A system we call uConfig allows service configuration changes to roll out in the same way as service upgrades.
What Features are Important in a Deployment System?
A combination of features make Micro Deploy a complete build and deployment management system for us. These are what we consider to be important to develop a deployment system for infrastructure systems like Uber’s:
Zero downtime for upgrades. Micro Deploy’s global gradual rollout system deploys the same version of the software to multiple data centers of different role and configuration. Fully automated deployments enable any engineer to roll out a change to their own services globally. We can be on our own, together.
Early, automated error detection. Micro Deploy integrates monitoring systems that detect anomalies early. Humans don’t have to watch for significant degrade in I/O performance, uncaught exceptions, HTTP error codes, or issues with request throughput and server load. μDeploy uses this monitoring data to ensure that the system remains stable during a new version’s rollout.
Outage prevention. Micro Deploy uses the monitoring data to stop and roll back a build to a stable version in the case of an anomaly. We occasionally see false positives, but better safe than sorry. Rollback is automatic and often occurs long before all hosts have the new version. Ideally, the rollback occurs in a canary area where a small enough batch of machines safeguards any failures from having external impact. We have to keep these unruly additions contained before they make our five-minute short about the latest development test into a feature film about damage control and the machines foiling everything we were working toward.
Reliable rollouts. Micro Deploy’s highly configurable workflow engine orchestrates the various phases of upgrades. As a distributed system, μDeploy can survive the unexpected shutdown of any host or rack (including the hosts running the workflow) during these upgrades.
Ease of use. Micro Deploy’s web-based application exposes all these features in a rich user interface. Any engineer can just access μDeploy via browser and deploy their services to production instantly.
REST API for deeper integration. Micro Deploy’s REST API enables third-party tools to integrate with its features.
From Mission to Commission
We designed Micro Deploy to avoid unnecessary deployment processes and create confidence that the rollout would occur correctly. If not, the system catches the occasional buggy upgrade quickly, with minimal consequence to production. In this way, if we make a stray mistake, we’re just working the system. Like many other major engineering initiatives at Uber, μDeploy was conceived, implemented in its initial form, and rolled out into production in several fun-filled months.
After two months in development, we onboarded Uber’s first services to Micro Deploy, and 50% of all services were using μDeploy in its first five months of production. That’s productive!
As of mid-2016, Uber’s back end is an ever-changing, massively distributed system spread across multiple data centers. Our engineers are now spread across a dozen offices in several countries and continents. Ninety-nine percent of all Uber software ships with μDeploy; that’s an A+! Micro Deploy gives our engineers everywhere speed, autonomy, and end-to-end ownership. Engineers write code, review it, test it, and put it into production the same day.
By going small, Micro Deploy has greatly impacted our engineering, and we are excited to continue to add improvements as we learn about how other distributed technology companies manage their builds. Builders, keep building!
Mathias Schwarz is a software engineer in Uber’s Aarhus engineering office, and wrote this article with Mrina Natarajan on the technical writing team.
Mathias Schwarz is a Sr. Staff Engineer working on the Up team as one of its founding members. Mathias works in the stateless management space and drives deployment safety efforts at Uber. In addition to his role, he also works with local universities helping students connect with Uber to intern and acquire their first experience in the industry. He holds a Ph.D. in Computer Science from Aarhus University.
Posted by Mathias Schwarz
Building Scalable, Real-Time Chat to Improve Customer Experience
20 February / Global
How Uber Serves Over 40 Million Reads Per Second from Online Storage Using an Integrated Cache
15 February / Global
DataCentral: Uber’s Big Data Observability and Chargeback Platform
1 February / Global
uVitals – An Anomaly Detection & Alerting System
Uber: GC Tuning for Improved Presto Reliability
Palette Meta Store Journey
Stopping Uber Fraudsters Through Risk Challenges