Stay up to date with the latest from Uber Engineering

Open Sourcing Peloton, Uber’s Unified Resource Scheduler

March 8, 2019 / Global

Share
Facebook
X social
Linkedin
Envelope
First introduced by Uber in November 2018, Peloton, a unified resource scheduler, manages resources across distinct workloads, combining separate compute clusters. Peloton is designed for web-scale companies like Uber with millions of containers and tens of thousands of nodes. Peloton features advanced resource management capabilities such as elastic resource sharing, hierarchical max-min fairness, resource overcommits, and workload preemption. As a cloud-agnostic system, Peloton can be run in on-premise data centers or in the cloud.
At Uber, Peloton is a critical piece of infrastructure powering our compute clusters. It is currently running many kinds of batch workloads in production, and we are planning to migrate stateless services workloads to it as well.
Today, Uber is excited to announce that we are open sourcing Peloton. By allowing others in the cluster management community to leverage unified schedulers and workload co-location, Peloton will open the door for more efficient resource utilization and management across the community. Moreover, open sourcing Peloton will enable greater industry collaboration and open up the software to feedback and contributions from industry engineers, independent developers, and academics across the world. 
Benefits of using Peloton
To our knowledge, there is no other open source scheduler which combines all types of workloads for web-scale companies like Uber. Prior to Peloton, each workload at Uber had its own cluster, resulting in many inefficiencies. With Peloton, we can colocate mixed workloads in shared clusters for better resource utilization.
As shown by the Google Borg paper, co-locating diverse workloads on shared clusters is key to improving cluster utilization and reducing overall cluster cost. Below, we outline some examples of how co-locating mixed workloads will drive utilization in our compute clusters, as well as helps us more accurately plan cluster provisioning:
Resource overcommitment and job preemption are key to improving cluster resource utilization. However, it is very expensive to preempt online jobs, such as stateless or stateful services that are often latency sensitive. Hence, preventing preemption of these latency-sensitive jobs requires us to co-locate batch jobs that are low-priority and preemptible on the same cluster, enabling us to better utilize overcommitted resources.
As Uber services move towards an active-active architecture, we will have capacity reserved for disaster recovery (DR) in each data center. That DR capacity can be used for batch jobs until data center failover occurs. Also, sharing clusters with mixed workloads means we no longer need to buy extra DR capacity for online and batch workloads separately.
Uber’s online workloads spike during big events like Halloween or New Year’s Eve. We need to plan capacity for these high-traffic events well in advance, requiring us to buy hardware separately for online and batch jobs. During the rest of the year, this extra hardware is underutilized, leading to extra, and unnecessary, technical costs. By co-locating both workloads on the same cluster, we can lend capacity from batch workloads to online workloads for those spikes without buying extra hardware.
Different workloads have resource profiles that are often complementary to each other. For example, stateful services or batch jobs might be disk IO intensive but stateless services often use little disk IO. Given these profiles, it makes more sense to co-locate stateful services with batch jobs on the same cluster.
Realizing that these scenarios would enable us to achieve greater operational efficiency, improve capacity planning, and optimize resource sharing, it was evident that we needed to co-locate different workloads together on one single, shared compute platform. A unified resource scheduler will enable us to manage all kinds of workloads to use our resources as efficiently as possible both in private data centers and the cloud.
Peloton will support Uber’s workloads with a single, shared platform, balancing resource usage by elastically sharing resources, and helping teams better plan for future capacity needs. Learn more about these benefits by reading our recent article on Peloton. 
Features in the current release
Uber has been running Peloton in production for more than a year and it’s scaling and running very well. Below are some of the feature highlights.
Elastic Resource Sharing: Support hierarchical resource pools to elastically share resources among different teams.
Resource Overcommit and Task Preemption: Improve cluster utilization by scheduling workloads using slack resources and preempting best effort workloads.
Optimized for Big Data Workloads: Support advanced Apache Spark features such as dynamic resource allocation.
Optimized for Machine Learning: Support GPU and Gang scheduling for TensorFlow and Horovod. Manage thousands of GPUs in production
Protobuf/gRPC-based API: Support most of the language bindings such as Golang, Java, Python, and Node.js.
Co-scheduling Mixed Workloads: Support mixed workloads such as batch, stateless, and stateful jobs in a single cluster.
High Scalability: Scale to millions of containers and tens of thousands of nodes as shown in our benchmark tests in our recent Kubecon Talk.
Uber’s Peloton team is also working on stateless service support, coming soon. Please visit the Next Steps section of our article on Peloton for more details.
Get started
We hope you try out Peloton for yourself! Learn more by reading our recent article on Peloton and our documentation or joining our Slack channel with any questions about the software.

Min Cai

Min Cai is a Distinguished Engineer at Uber working on the AI/ML platform (Michelangelo). He also led many infra projects such as cluster management (Mesos and Peloton), microservice platform (uDeploy), all-active datacenters, etc. He received his Ph.D. degree in Computer Science from Univ. of Southern California. He has published over 20 journal and conference papers, and holds 6 US patents.