A Docker registry’s primary purpose is to store and distribute Docker images. This may seem like a relatively trivial task, but with a large-scale compute cluster like Uber’s, it can easily turn into a scaling bottleneck. In computing environments with multiple regions and hybrid cloud systems, image distribution becomes even more challenging.
To solve performance issues with our legacy Docker registry stack, Uber’s Cluster Management team developed Kraken, an open source, peer-to-peer (P2P) Docker registry. Docker containers are a foundational building block of Uber’s infrastructure (we even built our own open source Docker image builder, Makisu!), but as the number and size of our compute clusters grew, a simple Docker registry setup with sharding and caches couldn’t keep up with the throughput required to distribute Docker images efficiently.
With a focus on scalability and availability, Kraken was designed for Docker image management, replication, and distribution in a hybrid cloud environment. With pluggable back-end support, Kraken can also be plugged into existing Docker registry setups as the distribution layer.
When developing Kraken, we explored multiple design choices along the way to end up with a P2P architecture. Kraken uses a P2P protocol tailored to a data center network environment and improves microservice lifecycle management at the enterprise level.
Kraken supports pluggable storage options, and instead of managing data blobs, Kraken plugs into reliable blob storage options like S3, HDFS, or another registry. The storage interface is simple, and new options are easy to add.
The unique data distribution solution is built on top of existing, well-established technologies. Furthermore, Kraken is self-healing, easy to maintain and supports lossless and rule-based async replication between clusters.
Massive performance improvements
Kraken was first deployed at Uber in early 2018, and since then, the performance issues we experienced with our legacy Docker registry stack have been resolved.
Our busiest Kraken cluster in production distributes more than 1 million blobs per day, with 100,000 of the daily distributions totaling to 1GB or more in size. Additionally, at peak production, Kraken enables 20,000 100MB to 1GB blobs to be distributed in under 30 seconds.
Kraken’s high scalability enables it to support at least 8,000 hosts per cluster and distribute Docker images at greater than 50 percent of a host’s maximum download speed limit in the cluster. In fact, with Kraken, cluster and image size do not have a significant impact on download speed.
Open sourcing Kraken
Since its internal launch, Kraken has been used to manage and distribute all Docker images at Uber.
By making the tool available to the broader open source community, we hope to inspire discussions on engineering and design best practices for building an adaptable and reliable infrastructure with Docker.
If building scalable cluster management applications interests you, consider applying for a role on our team!