Modernizing Artifact Storage at Uber
Staff Engineer
Sr. Software Engineer
Introduction
At Uber, every software build relies on a large set of dependencies and produces artifacts that must be reliably stored and distributed to downstream consumers. Most development happens in large monorepos across Go, Java, web, Android, iOS, and Python, complemented by thousands of microrepos that power services, libraries, and shared infrastructure. Regardless of repository shape or size, builds routinely depend on hundreds or thousands of artifacts.
To support this ecosystem, Uber has relied on a centralized platform for over a decade that sits on the critical path of our build systems. The platform handles two core workflows: resolving dependencies during builds and storing the resulting artifacts and build outputs. It used to be a third-party provided platform deployed in Uber’s on-prem data centers, offering deep integration with internal systems and full operational control.
Figure 1 shows the high-level architecture of the on-prem deployed platform.
Figure 1: Legacy architecture.
Clients routed requests through GeoDNS, which directed traffic to the nearest data center. Each cluster first attempted to resolve artifacts locally. If they weren’t found, it then queried the peer data center to fetch and store the artifact locally. Asynchronous push-based replication was triggered on writes between the 2 clusters, while a cron-based replication job served as a catch-all for handling replication gaps or failures. In the event of a data center outage or complete cluster loss, GeoDNS was also responsible for failing over traffic to the remaining healthy cluster.
This architecture served Uber well for many years. However, as Uber continued to grow in scale and artifact volume, we ran into several challenges.
Challenges of the Legacy Architecture
Disk Space
Storage on each node was limited, and running out of disk space could prevent the system from writing new artifacts. Even when other nodes in the cluster had available space, the system could fail to meet the replication factor, which also slowed reads due to slow I/O.
Recovering from this often required an engineer to manually rebalance storage across nodes to free up space. This process could take hours or even days and, in some cases, render the system effectively inoperable. It was also a high-risk operation that could cause further damage, including complete data loss. To mitigate the impact while recovery was in progress, traffic was failed over to the other cluster at the DNS until the affected cluster stabilized.
Inconsistent Replication
Replication happened asynchronously on writes. The node handling the write persisted the artifact locally and then triggered background replication to other nodes to meet the configured replication factor. These replication attempts could fail silently, with no reliable real-time observability, allowing an artifact to exist on only a single node.
To reduce this risk and improve observability, we deployed a cron job that detected under-replicated artifacts, proactively copied them across the cluster, and alerted when replication failed to complete.
Hardware Failures
The nodes used local disks rather than attached volumes, which meant a hardware failure required replacing the node and evacuating all of its data. Evacuation was a manual process that involved running scripts to backfill data onto a new node while keeping the cluster operational. This required careful throttling to avoid overwhelming disk I/O, network bandwidth, and node resources, and moving hundreds of terabytes of data was inherently slow.
Large backfills increased operational risk. Long-running transfers were more exposed to partial failures, retries, and performance degradation across the cluster, and any misstep could amplify the impact. In some cases, evacuation wasn’t possible at all, and if replication hadn’t been completed, this could result in permanent data loss. The combination of scale and manual intervention resulted in a high-risk operational model with an unacceptably large blast radius.
Software Upgrades
Software upgrades were a particularly high-risk and involved process. Upgrading the platform required large, multi-terabyte database schema migrations, coordinated configuration changes across nodes, and numerous manual steps. To reduce risk, we took the cluster out of service by failing over traffic at the DNS.
Even with the traffic drained, upgrades left little room for error. Long-running migrations stressed databases and disks, limited rollback options, and failures could leave the cluster in an inconsistent state. In the worst case, a failed upgrade could result in the loss of the entire cluster along with its metadata and stored artifacts.
Architecture
After evaluating multiple options, we decided to adopt a managed platform (SaaS) offering. Our decision was driven by the following architectural considerations:
- The managed service is available across multiple cloud regions, providing built-in regional isolation and high availability.
- Asynchronous cross-region replication keeps artifacts synchronized between the managed environments.
- Artifact storage is backed by cloud-native blob stores rather than local disks, removing capacity constraints and significantly reducing the risk associated with node or disk failures.
- Operating the platform as a managed service shifts responsibility for upgrades, security patches, and operational maintenance to the platform provider, eliminating the need for Uber to perform high-risk upgrades or node replacements without deep platform specific expertise.
However, moving to a managed platform isn’t a magic fix and comes with its own challenges. Unlike our on-prem deployment, consuming artifacts from a managed platform (cloud) incurs data egress costs. At Uber’s scale, we download more than 5 petabytes of artifacts every month, and this volume continues to grow with the number of builds and services. We therefore needed a solution that reduced repeated data transfers while preserving the SaaS platform as the source of truth.
Introducing a Proxy Layer
To address this, we introduced an internal proxy layer in front of the SaaS origin. Rather than acting as a traditional cache, the proxy functions as a validation layer. Every client request is forwarded to the SaaS origin, but conditional requests (with the If-None-Match and If-Modified-Since headers) are used to avoid redownloading artifact bytes when the content hasn’t changed.
The proxy maintains an artifact metadata store (MySQL) which records the artifact URL, artifact checksum, and last-modified timestamps. This allows us to reduce egress costs while ensuring correctness is always validated against the authoritative source.
We considered using a generic HTTP caching proxy such as NGINX, but traditional cache semantics based on TTLs and best-effort eviction don’t align well with artifact correctness requirements at Uber’s scale. Artifact delivery requires per-request validation, strong consistency guarantees, and graceful passthrough on failures, which would require substantial customization on top of a generic proxy. This led us to build a purpose-designed validation proxy instead.
How It Works
With the new validation proxy, all artifact requests are routed through it. For each request, the proxy forwards a request to the SaaS origin. If the proxy has previously seen the artifact, it includes conditional headers such as If-None-Match and If-Modified-Since.
A 304 Not Modified response indicates the artifact hasn’t changed, allowing the proxy to serve bytes from its cache. A 200 OK response indicates a new or updated artifact. The proxy streams the bytes directly to the client and updates its cache asynchronously to avoid adding latency to the client response. In all cases, the SaaS platform remains the source of truth, with correctness enforced through per-request validation rather than time-based cache expiration.
Figure 2 shows the proxy-based architecture with the SaaS platform. The SaaS components shown are conceptual representations.
Figure 2: Proxy architecture with SaaS.
Results and Impact
The proxy significantly reduces egress costs while maintaining comparable latency and reliability for artifact downloads, and enables improved observability across the entire artifact delivery path.
- Egress reduction: At Uber’s scale, artifacts are downloaded repeatedly across builds and environments. By avoiding re-downloading unchanged artifacts, the proxy reduces data egress from the SaaS platform by more than 99%.
- Reliability: The overall reliability now achieves 99.99%, measured at the proxy layer across all artifact requests.
- Sustainable cost growth: As build volume grows month-over-month, proxy-based validation ensures that cost increases are driven primarily by new or updated artifacts rather than repeated downloads of the same binaries. In practice, this reduces overall artifact-related egress costs by nearly 90%.
- Latencies: For unchanged artifacts, bytes are served directly from an internal cache, eliminating internet data transfer and reducing download time. Each request still performs a lightweight validation check with the SaaS; however, this call is small (metadata-only) and does not significantly impact latency. As a result, end-to-end latency remains comparable to the legacy on-prem architecture.
- Observability: The proxy provides full visibility into every request sent to the SaaS platform, including request failures, cache behavior, and artifact-serving paths. It also makes delays or gaps in regional replication easy to detect, enabling Uber teams to identify issues quickly and respond efficiently.
Failure Modes and Mitigations
As the proxy sits on the critical path, it’s designed to fail safely and preserve correctness by treating the SaaS platform as the source of truth at all times.
- Proxy dependency failures (cache or MySQL): If internal dependencies, such as the cache store, become unavailable or degraded, the proxy can manually fall back to passthrough mode. Requests are forwarded directly to the SaaS origin without relying on local state, ensuring builds continue to function correctly, albeit with higher latency and egress costs.
- Proxy process or host failure: The proxy is deployed in an active-active configuration across multiple nodes and data centers. Load balancing handles transparently individual instance failures, with no client impact.
- Uber data center or SaaS region unavailability: In the event of an Uber data center outage or a SaaS region failure, traffic is failed over via GeoDNS to the other healthy data center or cloud region.
- Stale or inconsistent cache state: The proxy never serves artifacts without validating them against the origin. Conditional requests ensure cached artifacts are only used when the SaaS explicitly confirms they’re still current.
- Partial or failed downloads: Artifact downloads are streamed and verified before being committed to cache. Failed or incomplete transfers are discarded to prevent corrupted artifacts from being served.
- SaaS regional replication delays or failures: Replication lags or inconsistencies between SaaS regions, surface through proxy-level observability. Requests are validated against the serving region, and delays can be quickly detected and alerted on to mitigate impact.
This design ensures that failures in the proxy or its dependencies degrade gracefully to higher cost or latency, rather than compromising correctness or availability.
Next Steps
As we’ve exercised the system under real-world workloads, a few patterns have emerged that are shaping our next set of improvements:
- Range request support in the cache layer: We’ve learned that certain build tools and retry behaviors make extensive use of partial downloads. In cases where Range (206) responses aren’t cached, this can result in full re-downloads of large artifacts. Supporting Range end-to-end would better align the cache with these access patterns and reduce unnecessary egress and latency.
- Large-object handling: We’ve observed that a very small fraction of artifacts (>8GB) can disproportionately contribute to egress when accessed repeatedly. This is guiding exploration of more tailored caching strategies for large objects.
- Node-local hot cache: Early usage patterns suggest that introducing a small node-local cache in front of the shared cache could help absorb bursty access and reduce tail latency, while still keeping the SaaS origin as the source of truth.
- Burst behavior under peak load: We’ve seen signs of thundering-herd behavior around hot artifacts during peak build traffic. This is motivating work on request coalescing and improved backpressure to better smooth these spikes.
These areas reflect learnings from initial production usage and represent opportunities to further optimize efficiency and latency as the system evolves.
Conclusion
By moving to a managed SaaS artifact platform and introducing a lightweight proxy, we significantly reduced operational risk, improved observability, and made artifact delivery more cost-efficient at scale. The proxy allows us to preserve the SaaS platform as the source of truth while keeping the overall operational and maintenance overhead light. Overall, these decisions have enabled Uber to build infrastructure to scale sustainably as our codebase, build volume, and engineering footprint continue to grow.
Acknowledgments
Cover Photo Attribution: “Circle Packing Artifacts“ by blprnt_van is licensed under CC BY 2.0.
Amazon Web Services®, AWS®, Amazon S3®, and the Powered by AWS logo are trademarks of Amazon.com, Inc. or its affiliates.
Google Cloud Storage™ is a trademark of Google LLC and this blog post is not endorsed by or affiliated with Google in any way.
Preetam Dwivedi
Staff Engineer
Preetam Dwivedi is a Staff Engineer on Uber’s Developer Platform team, leading code hosting, review, merge queues, and artifact management. He specializes in distributed systems and scalable developer infrastructure, driving cloud migrations and modernizing engineering workflows and integrations.
Manjari Akella
Sr. Software Engineer
Manjari Akella is a Sr. Software Engineer on Uber’s Developer Platform team. Since joining in 2018, she has focused on improving developer productivity and modernizing workflows. Her key work includes the Go Monorepo, Go library improvements, artifact management, and merge queues.
Products
Company