In November 2021 we decided to evaluate arm64 for Uber. Most of our services are written in either Go or Java, but our build systems only supported compiling to x86_64. Today, thanks to Open Source collaboration, Uber has a system-independent (hermetic) build toolchain that seamlessly powers multiple architectures. We used this toolchain to bootstrap our arm64 hosts. This post is a story with how we went about it, our early thinking, problems, some achievements, and next steps.
We started in November 2021 with an infrastructure that was exclusively Linux/x86_64. In January of 2023 we have:
- A C++ toolchain for both server architectures (x86_64 and arm64) that we use for production code, powered by zig cc
- Several Core Infrastructure services running on arm64 hardware, enabling its viability for future expansion
Let’s get into how we achieved this.
Before we dive in, let’s make acknowledgements first: bazel-zig-cc, which we cloned and was the foundation for the cross-compiler tooling, was originally created by Adam Bouhenguel. So, our special thanks to Adam for creating and publishing bazel-zig-cc–his idea and work helped make the concept of using Zig with Bazel a reality.
All major cloud providers are investing heavily in arm64. This, combined with anecdata of plausible platform benefits (power consumption, price, compute performance) compared to the venerable x86_64, makes it feel worthwhile to seriously consider making arm64 a part of our fleet.
So we set out to try and see for ourselves. The first goal could be phrased as the following:
Run a large-footprint application on arm64 and measure the possible cost savings
A key priority was to minimize the amount of work necessary to run and benchmark a service that consumes many cores. We identified two very different possible approaches:
- Hack together a basic arm64 support in a parallel zone or an isolated cluster in an existing zone and run the tests there (experimental quality)
- Make all of the core infrastructure understand that there is more than one architecture, then spawn the arm64 host just like any other SKU and test the application
The first option seemed like the right thing to do when considering our priority around minimizing investment. After all, why invest time and money into something that has a non-trivial chance of being abandoned? We considered running a “parallel zone,” which would have arm64 capacity, but otherwise be decoupled from production (and have much looser quality requirements, allowing us to move fast).
A bit later an important reason for arm64 got tacked on: if we can run our workloads on arm64, we can diversify our capacity, bringing us in a better position with regards to acquiring capacity. The fact that arm64 is needed for capacity diversification was an early signal to abandon the “quick experimental” route and instead spend more time enabling full support for arm64. Thus the mission statement became (and still remains today):
Reduce Uber’s compute costs, increase capacity diversity, and modernize our platform by deploying some production applications on arm64
Since we originally set out with a prototyping mindset, and now it was being turned π¹ solidifying a tenet to guide us emerged:
No hacks, all in main (i.e., no long-term branches or out-of-tree patches)
Now that we knew that arm64 needed first-class support in our core infrastructure, the project naturally split itself into two pieces:
- The very first task is to compile arm64 binaries out of the Go Monorepo, which hosts nearly all of our infrastructure code
- Everything that builds, stores, downloads and executes code (build hosts, artifact stores, and schedulers) needs to be changed to understand that there are two architectures
How can we compile arm64 binaries? By building natively on arm64 hosts or by cross-compiling, of course. To help guide our decision, let’s understand the differences and requirements for native and cross-compiling.
¹ “turned 180 degrees” is a common idiom. My team, being efficiency geeks (we also configure sysctl knobs for the Linux kernel!) tend to use an equivalent shorter expression “turned π”, because, of course, radians are implied.
Basics of Native and Cross-Compiling
Some terms out of our way:
- A binary is a program in machine code compiled from source.
- A toolchain is a set of tools that are needed to compile source code to a binary. This usually includes a preprocessor, compiler, linker, and others.
- A hermetic toolchain is a toolchain that, given the same input, always produces the same output regardless of the environment. In this context “hermetic” means that it does not use files from the host (which is “leaky”) and contains everything it needs to compile a file.
- A host is the machine that is compiling the binary.
- A target is the machine that will run the binary.
- In native compilation the host and target are the same platform (i.e., operating system, processor architecture, and shared libraries are the same).
- In cross compilation the host and target are different platforms (e.g., compiling for x86_64 Linux from macOS arm64 (M1)). Sometimes the target machine may be unable to compile the code, but can run it. For example, a watch can run compiled code, but cannot run a compiler, so we can use a cross-compiler to compile the program for the watch.
- Sysroot is an archive of a file system for the target. E.g. target-specific headers, shared libraries, static libraries. Often necessary by cross-compiler toolchains; discussed below.
- AArch64, aarch64, or arm64 (used interchangeably) is the processor architecture.
The diagram below shows how to turn a source file main.c into an executable by compiling natively (left) and cross-compiling (right).
Native compilation requires less effort and configuration to get started, because this is the default mode for most compiler toolchains. On the surface, we could have spawned a few arm64 virtual machines from a cloud provider and bootstrapped our tools from there. However, all our servers use the same base image, including the build fleet. The base image contains many internal tools that are compiled from our Go Monorepo. Therefore, we had a chicken-and-egg problem: how can we compile the tools for our first arm64 build host?
Example: Cross-Compiling with GCC and Clang
Let’s compile a C file on a x86_64 Linux host, targeting Linux aarch64:
Note that GCC invokes a target-specific executable (aarch64-linux-gnu-gcc), whereas Clang accepts the target as the command-line argument (-target <…>):
Cross-compiling a C source file with both GCC and Clang seems easy on the surface. But what is behind it?
LLVM-Based C/C++ Toolchain
Which files did “clang” use to build the final executable? Let’ strace(1):
These are the files it touched:
- (not displayed) tools: the C compiler (Clang) and the linker (ld)
- Headers in /usr/aarch64-linux-gnu/include/. These are usually GNU C library headers. Some programs use public headers of the Linux kernel, but not in this example. Headers are target-specific.
- Compiled, target-architecture specific libraries:
- Dynamic linker /usr/aarch64-linux-gnu/lib/ld-linux-aarch64.so.1
- C library, shared object: /usr/aarch64-linux-gnu/lib/libc.so.6
- Program loaders: *crt*.o
- Less interesting libraries: libgcc and libc_nonshared
Now that we know what is used for the cross-compiler, we can split the dependencies into two categories:
- Host-specific tools (compiler, linker, and other programs that are used regardless of the target)
- Target-specific libraries and headers that are needed to assemble the final program for the architecture
Uber needs to support the following targets:
At the time of writing neither GCC nor LLVM can cross-compile³ macOS binaries. Therefore we maintain a dedicated build fleet to compile to macOS. Cross-compiling to macOS targets is highly desired to homogenize our build fleet, but we are not there yet.
Here are the host platforms that we support today:
- Linux x86_64: build fleet, Devpods and developer laptops
- macOS x86_64: older generation macOS developer laptops
- macOS aarch64 (Apple Silicon): newer generation macOS developer laptops
Here is a picture of host toolchains, sysroots, and their relationships where every host toolchain (left) can use any of the target-specific sysroots (right):
To support these host and target platforms, we need to maintain 8 archives: 3 toolchains (the compiled LLVM for each host architecture) and 5 sysroots for each target architecture. A typical LLVM toolchain takes 500-700 MB compressed, and a typical sysroot takes 100-150 MB compressed. That adds up to ~1.5 GB of archives to download and extract before compiling code, on top of all the other tools. To put this into perspective, the Go 1.20 toolchain for Linux x86_64 is 95 MB compressed, and is the biggest required download to start compiling code.
For completeness sake, let’s look at GCC for a moment. As you may remember from earlier in this section, GCC cross-compiler is aarch64-linux-gnu-gcc. That means a full toolchain is needed for every host+target platform. Therefore, if we were to use a GCC-based toolchain, we would need to maintain 3*5=15 separate toolchains. If we add a new host platform (e.g., Linux aarch64 is realistic in the short term) and two target platforms (linux glibc.2.36 for x86_64 and aarch64 respectively), then the number of maintained archives jumps to 4*7=28!
At the time we were shopping for a hermetic Bazel toolchain, we evaluated both GCC and LLVM-based toolchains. LLVM was slightly more favored due to linear growth of required archives (as opposed to quadratic in case of GCC). But can we do better?
² If we wanted to, we could spare glibc 2.31, because we can run binaries compiled with glibc 2.28 on a glibc 2.31 machine (but not the opposite).
³ For the compiler nerds: technically this statement is not correct, Clang and lld do support macOS, however, due to various issues such as bugs and missing files, cross-compilation will not work in practice.
Zig takes a different approach: it uses the same toolchain for all supported targets. Using the previous example:
Which files does it use to compile it? If we strace the above execution we will see that only files from Zig SDK (and in /tmp⁴ for intermediate artifacts) were referenced. Nothing touched the host system. Which means that Zig is fully self-contained.
How can Zig do that, while Clang cannot? What is the main difference between plain Clang and zig? Zig needs all the same dependencies as Clang. Let’s inspect and discuss them here:
- Tools: the C compiler (Clang) and the linker (lld).
- Those are statically linked into the Zig binary, and for macOS Zig implements its own linker.
- Headers in /usr/aarch64-linux-gnu/…
- Zig bundles multiple versions of glibc, musl libc, linux kernel and a few other headers, and includes them automatically.
- Compiled, target-architecture specific libraries: dynamic linker, glibc (multiple versions), program loaders.
- Zig compiles all of those on the fly, behind-the-scenes, depending on the desired target.
- Less interesting libraries: libgcc and libc_nonshared.
- Zig re-implements functions in those libraries.
As a result of this, Zig can compile to all supported targets with a single toolchain. So to support our 3 host and 5 target platforms, all we need is 3 Zig tarballs, downloaded from ziglang.org/download:
Andrew Kelley, the creator of Zig, explains what Zig adds on top of Clang in more detail in his blog post. The prospect of requiring just a single toolchain for the host, regardless of how many target platforms we wish to support, was tempting.
Let’s try to do something no other toolchain can do out of the box: cross-compile and link a macOS executable on a Linux machine:
Even though at the end of 2021 Zig was a very novel, unproven technology, the prospect of a single tarball per host platform and ability to cross-compile macOS targets won over the team towards Zig. We collaborated with Zig and started integrating zig cc to our Go Monorepo.
⁴ Technically in Zig’s cache directories. In our Go Monorepo that is /tmp.
Bazel and Zig
Having a C++ toolchain (in this case, the Zig SDK) is not enough with Bazel: it needs some glue code, a toolchain configuration. In February 2022 the initial support for zig cc in Go Monorepo was added under a configuration flag:
bazel build –config=hermetic-cc <…>
Initially, everything was broken. Most tests could not be built, let alone executed, because C dependencies in our Go code were not hermetic. We started a slow uphill climb of fixing all of these problems. By September 2022 all tests were passing. Since January 2023, the Zig toolchain compiles all of the C and C++ code in Uber’s Go Monorepo for Linux targets.
As Uber has been running binaries produced by Zig since April 2022, we have quite a lot of trust in it. The glue between Bazel and Zig was originally in Adam Bouhenguel’s repository bazel-zig-cc, which was later cloned and further developed by Motiejus Jakštys in his personal repository that was ultimately moved to github.com/uber/hermetic_cc_toolchain.
The collaboration with the Zig Software Foundation allowed us to ask for solutions important to us. Zig folks helped us find and fix issues in both Go (example) and Zig (example). Since the relationship went well in 2021, Uber extended the collaboration for 2023 and 2024. All the work that Zig Software Foundation does is open sourced (until now to Zig or Go), benefiting the wider community. Since the Foundation is a nonprofit, the value of the collaboration will be publicly accessible when the books for 2023Q1 are released.
arm64 Progress After the Toolchain
Once the toolchain was mature enough to compile to arm64, we started cementing arm64 support internally. For example:
- When a developer defines a Docker Image in Go Monorepo (using rules_docker, an equivalent of a Dockerfile, but in Bazel), CI will compile the dependent code for both x86_64 and arm64, and will not allow it to land if it does not compile.
- We compiled and published all the Debian packages in Go Monorepo to arm64 even though we did not need most of them yet. Similarly to Docker Images, CI ensures that both arm64 and x86_64 can compile. It is currently impossible to declare a new Debian package in our Go Monorepo that does not compile for arm64.
Once we were able to compile programs to arm64, we started adopting all the systems that store, download, and execute native binaries. Right now, we have:
- arm64 hosts in our dev zones bootstrapped just like all other x86_64 hosts
- A couple of Core Infrastructure services (e.g., inhouse-built container scheduler and dominator) running on arm64 hosts
- Stamina to keep expanding arm64 usage and support.
Our plans for 2023 include:
- Kubernetes support for arm64
- Run a customer-facing production service on arm64 hosts on Kubernetes
Does Uber use Zig-the-Language?
Yes and no. For example, at a high level, the launcher in hermetic_cc_toolchain is written by us in Zig. The runtime library (compiler-rt) embedded to nearly every executable, is written in Zig. To sum up, the majority of our Go services have a little bit of Zig in them, and are compiled with a toolchain written in Zig.
Given that, we have not yet introduced any production applications written in Zig into our codebase (where the toolchain is fully set up), because only a few people in the company know the language at the moment.
As of 16th of January, 2023, all C/C++ code in production services built from Go Monorepo are compiled using Zig c++ via hermetic_cc_toolchain. Since Zig is now a critical component of our Go Monorepo, maintenance of hermetic_cc_toolchain will be funded both financially (as of writing, via the collaboration with Zig Software Foundation through the end of 2024) and by Uber employee hours.
While we can run our core infrastructure on arm64 hardware, we are not yet ready to run applications that serve customer traffic. Our next step is to experiment with customer-facing applications on arm64, so that we can measure its performance and decide future direction.
If you like working on the build systems or porting code to different architectures, we are hiring. We will present our work in the upcoming Software You Can Love conference in Vancouver, BC, on June 8, 2023. If you are there and want to chat, either about Uber or about this work, feel free to talk to us in person at the conference.
We’d finally like to thank Loris Cro, Abhinav Gupta, and Gediminas Šimaitis, both for the illustrations and for feedback on the initial drafts of this post.
Header Image Attribution: The “Zero on Mt AArch64” image is covered by a BY-NC-SA 4.0 license and is credited to Joy Machs and Motiejus Jakštys.
Motiejus Jakštys is a Staff Engineer at Uber based in Vilnius. His professional interests are systems programming and cartography. Motiejus is passionate about writing software that consumes as little resources as necessary.
Laurynas Lubys is a Senior Engineer at Uber based in Vilnius. He is interested in building correct software: this includes rigorous testing, functional programming, reproducible build systems and thinking very hard about security. You might catch him debating pros and cons of programming languages and approaches.
Neringa Lukoševičiūtė is a Software Engineer on the Uber Infrastructure team based in Seattle. She primarily works on the arm64 project and is enthusiastic about integrating arm64-related technology into Uber’s infrastructure.
CheckEnv: Fast Detection of RPC Calls Between Environments Powered by Graphs
September 13 / Global
Bypassing Large Diffs in SubmitQueue
Up: Portable Microservices Ready for the Cloud
Case study: DART goes big in 30 zones
CheckEnv: Fast Detection of RPC Calls Between Environments Powered by Graphs