Behind your ability to ‘tap a button and get a ride’ on the Uber platform, a complex ecosystem of microservices works together to deliver a seamless user experience. With over 3,000 microservices running at any given time, visibility into Uber’s distributed architecture is critical to ensuring that these services run as smoothly as possible.
When Yuri Shkuro joined Uber Engineering’s New York City office in 2015 as an engineer on the Observability team, this mission couldn’t have been more clear. While Uber used traditional monitoring tools for metrics and logging, we needed an additional layer of visibility that could tackle the tremendous growth of our distributed architecture.
In 2015, Yuri and his team built Jaeger, our distributed tracing system, to find problematic microservice interactions in our architecture. Inspired by existing tools, such as Dapper and Zipkin, Jaeger gives us greater visibility into and across microservices, allowing us to easily track and troubleshoot bugs that inhibit our systems from doing their jobs.
Uber open sourced Jaeger in early 2017, giving developers in the broader community the benefits of its end-to-end distributed tracing and encouraging contributions to grow the project for a broader set of use cases. Jaeger’s popularity led to its support from RedHat, Google, and several other companies. In fact, after Uber open sourced Jaeger, RedHat announced that it would stop developing its own distributed tracing project, and focus its efforts on collaborating on Jaeger. Then, in September 2017, the Cloud Native Computing Foundation (CNCF) accepted Jaeger as its 12th hosted project, helping grow the project’s community and more deeply integrate it with other open source solutions for cloud native architectures. With these experiences in mind, Yuri wrote Mastering Distributed Tracing, a book based on his time building, deploying, and operating Jaeger.
We sat down with Yuri to discuss his journey at Uber, his experience developing Jaeger, and how to grow an open source community from scratch:
How did you first get interested in open source?
When I pursued my PhD in computer science, my focus was on the intersection of neural networks and genetic algorithms. At that time, there was already an open source movement in my field, and some of the tools and algorithms that we used were open source. But as far as actually developing software for open source, that started when I joined Uber.
Before joining Uber, I was working at an investment bank. All of our code was closed source, so I didn’t really have an opportunity to get involved in the community.
What is different about writing open source code versus closed source code?
For me, it’s not so much about the development process than it is about team culture. In terms of the quality of the code, I don’t think there is much difference between closed source code and open source code. In closed source code, it’s sometimes easier to develop with dependencies on some internal stuff because this is how you plan to deploy it. You don’t have to ask yourself, “Oh, what’s the extensibility we need to create for people who want to deploy in a slightly different way?” In that regard, you write open source code in a more flexible, extensible, and thorough way than you would in a proprietary setting.
What encouraged you to join Uber?
After working for a bank for 15 years, I was ready for something completely different. I was looking for a tech company and projects that would give me the opportunity to work on complex engineering problems, and Uber offered both. When I joined Uber in 2015, our New York City engineering office only had about 10 people. Having previously worked on product-oriented, end user-facing software, I explicitly joined Uber’s Infrastructure team because of my interest in the challenges in the lower stack, especially at the Internet scale. Three years later, I still find something new and exciting to tackle everyday.
How did you come about developing and then open sourcing Jaeger?
When we started working on Jaeger, we built the data collection path internally as new code and used part of the open source tracing tool, Zipkin, for the front-end. To do that, I had to make contributions to Zipkin, working on things that we needed to use at Uber. Contributing to Zipkin was my first attempt at open source.
In parallel, I was working on the OpenTracing project, an open source collaboration with many companies. Our team began developing client libraries that were implementing the OpenTracing API and collecting data in Zipkin format. We decided to open source those right away because they were extending the Zipkin ecosystem with the OpenTracing support.
While our internal data collection components were written in Go (and fully supporting OpenTracing specifications), our Zipkin query and UI components were written in Java and thus could not understand some of the OpenTracing features. This dichotomy created unnecessary friction for the Jaeger project at Uber. Java wasn’t even one of the officially supported languages at the time internally. We decided to cut the cord, remove the dependency on Zipkin, and implement the missing UI functionality ourselves. Through this change Jaeger became independent from Zipkin, offering end-to-end integration, from client libraries in multiple languages to the backend and new shiny UI, with OpenTracing support throughout the whole stack. This was work we were proud to show to the rest of the industry, and we went along with open sourcing all its components.
How did Uber initially leverage Jaeger?
When I joined Uber, tracing was new to me—I was basically learning as we went. As we started developing Jaeger, I looked at the industry and realized that tracing as a discipline hadn’t actually progressed much since Google published a paper on Dapper in 2010. When we developed Jaeger, we built it for the “classic” tracing use case, where engineers pick a trace and investigate it for performance or correctness problems. However, when we deployed it in a lot of systems at Uber, it turned out that this is not how developers want to use tracing. The classic use case still has its place, but it utilizes a tiny fraction of the rich data we collect via tracing infrastructure. We started doing a lot more data mining on large tracing datasets, which has proven incredibly more useful than what we typically read about in the blogs or hear in conference talks.
What surprised you about the community that’s grown around Jaeger?
I can’t really claim surprise because this is my first open source project, so I have nothing to compare it against. But I think the development of this community has been pretty great.
We were fortunate in that we got on the Go language bandwagon with Jaeger. Since Go is such a lightweight language, it’s very easy to run the binaries, one of the differentiators between Jaeger and other tracing systems. Being written in Go attracted a large number of people in the tracing community, and now it’s been adopted by the Cloud Native Computing Foundation (CNCF), which has many projects that are written in Go.
When we first wrote about Jaeger on our Engineering Blog, the article attracted other companies, including RedHat, which is now one of the project’s top collaborators. RedHat developers had a lot of experience with open source, so we were lucky to get a second company, a great partner, supporting our project. It has contributed a lot of code, including all of the support for running on Kubernetes.
Since being released, Jaeger has received over 6,000 stars on Github and over 13,000 commits in 2018 alone. Why do you think Jaeger has been so successful?
In my opinion, Jaeger’s success boils down to three contributing factors. First, we’ve developed Jaeger support for storage systems beyond what we use at Uber. For instance, at Uber we use Cassandra for storage, but we built support for Elasticsearch and that turned out to be way more popular. Internally, it didn’t work for us but our community is really happy with it.
Secondly, we made it very easy to get started with Jaeger. We package it as a container that users can just run in one single executable locally on their machine, immediately collecting and then displaying all the traces. We also offer a number of tutorials at conferences like Velocity and KubeCon.
The third factor was our commitment to documentation. Documentation is very important for the success of an open source project, and very often, internal projects are not thoroughly documented. We really amped up our documentation to make Jaeger ready for the public.
Why do you think the Cloud Native Computing Foundation (CNCF) chose to host Jaeger?
RedHat encouraged us to apply to the CNCF. It was on my mind, but I didn’t feel a sense of urgency; I figured we would apply when the time came. But for RedHat, the urgency was that it wanted to more deeply invest in Jaeger, making it part of the open source bundle that RedHat provides to its customers, and it would’ve been difficult for them to commit to if the project was controlled by a single company.
OpenTracing was already a CNCF project and Jaeger was the only open source system in the industry at the time that was fully committed to OpenTracing. Both OpenTracing and Jaeger are important for building a distributed—often cloud-based—microservice architecture, so CNCF adopting these projects made sense.
How did you feel when the CNCF chose Jaeger?
It felt pretty good to have Jaeger’s gopher logo displayed on the CNCF website in the same short list of projects as Kubernetes. Our team was very proud.
How has Jaeger’s CNCF adoption affected its growth?
Being in the CNCF helped us with promoting the Jaeger project. While Jaeger had a designated site and we had published a blog article about it when it was released, joining the CNCF gave us the support and exposure possible to really grow the Jaeger community. Now we have Jaeger tracks at KubeCon and other events. Project contributors from Uber and RedHat present on Jaeger at conferences worldwide. Our partnership with the CNCF has really enhanced the project’s visibility.
Aside from publicity, membership in CNCF also forced us to make other useful changes, such as switching to a different, enterprise-friendly Apache 2 license, adopting code of conduct, formalizing project governance and selection of maintainers, and implementing core infrastructure best practices.
What do you think makes Uber’s open source program unique?
For some companies, open source is a core function of the business. While Uber uses and builds open source technologies, we measure business success by how well our technologies contribute to our ability to deliver safe and reliable transportation. I think at any company where open source isn’t your end-product, it can be difficult to balance the two. When you factor in the scale and fast growth of our ridesharing or Uber Eats services, it makes sense that we first and foremost build software for our own needs and then open source it if it makes sense and seems useful for others. In terms of the quarterly goals that most teams set, open source isn’t always accounted for, which can be a challenge.
That being said, open source is a big priority for a lot of teams at Uber, such as our Observability and Data Visualization teams. Given the demands of our business—data-driven engineering at scale—these are two domains where we’ve really been able to make distinct contributions to the open source community.
In addition to being the lead on Jaeger, how are you involved with Uber’s open source program?
I am a member of Uber’s Open Source Engineering Committee, one of the groups responsible for determining which projects we open source. The two things we look for in an open source project are quality and commitments from the team to support it.
Over the past nine months, we revamped a lot of our guidelines to emphasize the importance of commitment. We’re even thinking of automating some of it in terms of ownership so that it’s automatically tracked through software. We want to make sure that, if someone transfers to another part of the company, someone in the management chain must make a decision regarding the fate of an open source project and not let it fall through the cracks.
What do you like most about being a part of Uber Open Source?
What I like the most is how it opens up a lot of ways to learn more about your project and problem domain than when you keep it as closed source. If you just read blogs or books, you don’t necessarily get to explore the challenges that many people are facing every day. We have a lot of people coming with fairly unique problems and questions about both OpenTracing and Jaeger, things that might never occur to you otherwise, and suddenly it becomes a very interesting use case that we might want to work on.
What advice would you give to teams looking to open source their code?
The trajectory of an open source project is like starting a new business. If you want to be successful, not only do you have to put time and support in it, but also figure out whether you want to put in the time to promote it at conferences and through blogging. There are so many great projects hidden on GitHub, but just because you open source it doesn’t mean it’s going to be used.
Another piece of advice is to foster a community that values respect and listening. It’s important to build software that is useful for organizations besides your own, and the only way to do that is through collaboration.