
Software engineers Matt Schallert, Katie Tezapsidis, Carissa Blossom, and Tom Croucher discuss what it takes to prepare our systems for holidays and other high traffic events.
While most spend New Year’s Eve watching the ball drop or celebrating with friends (festive party hat in tow), Uber Engineering has historically treated the holiday like a final exam. With users worldwide relying on Uber for safe and reliable transportation to and from their celebrations, New Year’s Eve often marks our highest traffic event of the year, with Halloween as a close second. Our ability to handle user traffic on both holidays is the culmination of many months of planning, load testing, and future-proofing our systems.

At Uber’s current scale, however, major events like Halloween and New Year’s Eve are anything but special. With lessons learned from holidays past and an ever-growing suite of on-call tools at the ready, our networks are available, reliable, and elastic enough to handle high traffic loads year-round.
On the back-end, several teams are responsible for maintaining the reliability of our networks, but two stand out—Site Reliability and Observability Engineering:
- Site Reliability Engineering (SRE) partners with Uber’s development teams to improve product reliability while growing and operating infrastructure at scale.
- Observability Engineering is dedicated to providing metrics, tracing, and alerting for all the other engineering teams at Uber.
Located in offices across the globe, these teams work to ensure continuity and scalability across Uber’s services by creating and maintaining applications as well as triaging and fixing issues in real time.
Now that we know who is responsible for helping you make it to your New Year’s Eve soiree safely and on time (sequins intact), what does it take to prepare for holidays and other large-scale events? Below, members of our Site Reliability and Observability Engineering teams discuss how we keep our networks extensible and reliable in the face of peak traffic.
Matt Schallert, Software Engineer, Observability Engineering

What is your role on the Observability Engineering team?
Matt Schallert (MS): I work on building and operating Uber’s metrics infrastructure to give engineers visibility into high resolution metrics across billions of unique time series. Teams at Uber use these metrics to measure everything happening with a service and the infrastructure it runs on. These metrics drive real-time alerts, anomaly detection, service health, hardware, and infrastructure.
In the lead up to large-scale events, engineers need to measure more things than normal and we have to scale to that demand. Our goal is to allow every engineer to measure, tune and mitigate to make sure we always provide the best user experience possible, even during times of peak traffic.
Why are Halloween and New Year’s Eve such important holidays for Uber?
MS: These are important nights for Uber because many people rely on us to have a safe, efficient, and less stressful way of getting around. Particularly in the United States, Halloween helps us test our services and infrastructure for the traffic increase we see globally during New Year’s Eve. In this way, Halloween and New Year’s Eve planning are deliberately and strategically tied together. Since Halloween precedes New Year’s Eve by a few months, it gives us time to apply our learnings from October to get ready for what is historically our biggest night of the year.
How does Uber load test in preparation for high traffic events?
MS: The Site Reliability Engineering team runs large-scale event drills to simulate the platform running at our predicted trip volume for the event. This simulated trip volume exercises every dependent service and flow of the user experience from requesting a ride to billing after the end of a trip. If a service begins to degrade during the drill, we pause the exercise to address the issue.
It should come as no surprise that our team’s highest priority during these tests is to fix any bottlenecks before the next one so that we can continue to iterate and strengthen our network. The frequency of these drills increase as we get closer to the event, until we’re eventually running drills multiple times a day.