Shadower is a load testing tool that allows us to provide load testing as a service to any microservice at Uber.
Shadower started as a command line application that allowed us to read a local file to load test a local application. At the time, Maps PEs were heavily investing in Java GC tuning. We needed Shadower to be able to do request mirroring to make sure two different applications get about the same load and different types of loads (test multiple endpoints). In summary, we were starting the same application twice: once with the production configuration and once with the new GC tuning. How’s this different from testing on staging? There are a few differences:
- Velocity: Testing on staging means pushing code to a branch, generating a build, and deploying it. This can take ~10 minutes compared to seconds locally, because locally we don’t need to generate a new build for configuration changes.
- Complexity: Our current infrastructure requires extra steps to allow multiple builds to run concurrently in the same environment, the reason for this is to reduce inconsistencies.
- Reliability: The current tools didn’t provide request mirroring, so we are at the mercy of the load balancing algorithm and how well randomized the requests are that we are using for our load test.
- Ease of use: The current tools needed developers to write code to be able to onboard a new service/endpoint.
Shadower stayed as a command line application for about a year until we built Ballast. Ballast required a reliable and easy-to-manage load test generator. Ballast is in charge of tracking resources utilization and deciding what is the right load for any load test. Shadower only focuses on executing those load tests.
The changes that were required to make Shadower as a service were:
- Creating a coordination layer to schedule load tests requests
- Onboarding to our time series databases to emit metrics of each load test
- Quota management–we needed a way to limit the number and size of load tests
- Creating a storage layer to save the files with all the requests
- Base64 binary encoder/decode
- We store all the payloads as base64 binary data:
- To reduce the synchronization cost of proto/thrift files
- To reduce CPU usage of all the proto decoding
- We store all the payloads as base64 binary data:
- Easy modification by developers
- More flexibility: things like adding headers if the endpoint matches a regex, modifying default timeouts, etc., all of which were not considered in the command line version
Shadower uses a coordinator/worker architecture. The coordinator is in charge of querying all the load tests in the scheduled state and finding a suitable worker to run them.
The coordinator is the public-facing component that uses leader election. Any action gets asynchronously replicated to every peer in the cluster. Most of the time it uses differential updates, but whenever it finds a new peer in the cluster it must send a full update to make sure every peer has the same state of the system. There is a tag to keep track of any changes for all the relevant objects.
From any coordinator you can:
Schedule a Load Test
Scheduling is done asynchronously. Any coordinator can receive the requests and it initially sets the state to “scheduled,” then the master is querying every second any load test in this state and tries to execute it if there are workers with enough quota.
Update a Load Test
Any coordinator can receive the request, but it always forwards the request to the master so that the master can check if the current worker running the load test has quota available.
Stop a Load Test
Any coordinator can receive the request. Then the coordinator changes the state to “stopped.” Finally, when the worker sends the next heartbeat, the coordinator finds out that the load test has been stopped and notifies the worker.
List All Load Tests/Query a Load Test
The endpoint by default returns all the current load tests, but it provides the following filters:
- Search by status
- Search by UUID
The worker is in charge of executing the load tests. It has the following features:
It uses heartbeats to transfer the state to the coordinator. It means that the coordinator could have a slightly different state because it is an eventually consistent system. When the coordinator has load tests to be scheduled and decides to use worker X to run it. If by the time it receives the request there are 3 possible scenarios:
- The quota is still the same, so there are no changes in the coordinator.
- The quota is different, but it still had enough quota to run it. The coordinator receives a successful response with the updated quota.
- The quota is different and it is not enough to serve the request. In this scenario the master puts the request back in the queue to wait for the workers to finish the currently running load tests to get more quota.
Read Payloads from Storage
We have a blob storage from where we can download previously uploaded files with all the requests.
Additionally, it has the ability to replay payloads a specific number of times, in case you want to ensure that each payload gets executed only once (e.g., stateful services).
If you provide multiple hosts, it can send the same request to every host. As we mentioned in the first section of this document, this is useful to ensure we are sending the same load to every host in the load test.
Each load test accepts a tag that the workers used to emit metrics, which is how we can differentiate between different load tests. If the metric is not provided, metric emission is disabled.
It is a command-line tool that provides encoding/decoding for all the available encodings we use at Uber (JSON, Thrift, Protocol buffer). It allows users to easily modify any raw payload that we sniffer using Ebpf on production hosts.
maps to →
Accelerate Cache Loading
Imagine that your cache gets invalidated every few days/weeks. The first option that we have is that we can wait for those requests to arrive and cache them, but that means that those initial requests are going to have a latency impact. Instead we can force Shadower to load the popular requests, so when those requests arrive we can quickly dispatch them.
Due to the complexity of Uber’s architecture, a common unified shadow library solution to forwarding traffic to staging for migrations/experiments validation has a few problems:
- Service constraints: What if the service owner only wants to forward a very specific set of requests? We could provide some kind of filtering, but then we would end up with a very complex filtering system running in production.
- Performance impact: In order to forward those requests, the service must process them, which translates to more CPU and memory.
Validate Rate Limiters and Load Shedding
As reliability engineers we must make sure that each and every service has the proper rate limit. There 2 possible scenarios we want to avoid:
- Rate limit is too low, which translates to wasted resources and potential errors in the caller service.
- Rate limit is too high, which can translate to the service being at risk of going completely down
Detect Performance Regressions
Automating load tests to run on every deployment (in an environment where there is no noise from production traffic) we can easily detect if there has been a significant change in memory, CPU, latency, etc.
Around 40% of our services are Java-based. For those not familiar with it, Java uses JIT compilation, which means that the initial code is slower than the code that runs after a few seconds. In order to improve this, a developer can load test for a few seconds before receiving production traffic.
The journey for Shadower is still not complete, even as it continues to benefit teams across Uber in the current form. We are continuously evolving it and adding features that improve its flexibility, scalability, and ease-of-use. Shadower is meant primarily for stateless services, but we plan to add support for stateful service in the near future. Keep an eye out for future Shadower blogs!