Making a Business Statement with Uber Engineering

January 26, 2016 / Global

Share
Facebook
X social
Linkedin
Envelope
Supporting business travelers is more than just getting them from point A to point B. Corporate clients need expense reporting, integration with travel management systems, centralized payment methods, and the ability to give the right sets of people the right options to get around. The Uber for Business Team has created many tools that make managing corporate travel on Uber much easier. But until recently, most of these features were for the travel admins. Few features in the actual app were built for the employees taking the rides. This changed this past fall when we launched a feature allowing users to take Uber trips under different profiles.
 
Anyone can now create a business profile, get their ride receipts forwarded to their company email, and opt in to receiving detailed weekly and monthly reports of their Uber travel, making managing their expenses much easier. 
How did this happen? Now that we’re well post-launch, here’s the story of how we engineered U4B’s latest set of rider-centric features.
Generating Travel Reports
Individually fetching and rendering PDFs of people’s trips is simple enough. Our challenge was to do this at Uber scale and speed, where services can grow 10x every few months. To meet this demand, we knew we would have to create a new service to automate the job of rendering and serving PDFs. We had two use cases to start with:
Building weekly and monthly travel reports for individual business travelers.
Building monthly summary statements for company admins of all their employees’ travel.
U4B has over fifty thousand companies enrolled in Uber for Business. When we introduced our plans internally to build this functionality we had to take into account not just the administrators, but also their employees who receive summaries and travel reports. Taking into account Uber’s growth rate, we knew this could result in generating millions of PDFs at once.
A Simple Way: Client Calls
We can design a service by thinking of the dialogue that takes place between the client and the server. A simple way to write a statement service would be to have the client call in whenever it needs a new PDF. Under this system, every Monday, when U4B needs more weekly travel reports, the dialogue looks like this:
     Client: Make me a weekly statement for user uuid1 that meets the 
             following criteria:
             – Trips for this user riding under the business profile.
             – Trips for this user that took place last week.
    Service: Okay, I have the information. I’ve fetched all the trips. 
             Here is your PDF back.
     Client: Thanks! Now make me a weekly statement for user uuid2 …
This is a natural way to start, but it has performance and traffic issues that prevent it from meeting Uber’s scaling demands. Specifically:
Latency. Fetching the data and rendering a PDF takes several seconds; thus hundreds of thousands of requests take days of computer time to render. Of course many machines can process these simultaneously, but even a large parallelized batch job could take hours for all threads to complete. Since RPCs won’t stay open for hours, the job will fail and retry, spamming our system with duplicative work.
DDoS. Our dialogue example makes it look like U4B is building statements serially, one after the other, but in reality it would be making them all in parallel. So every Monday U4B would hit the statement service with hundreds of thousands of internal infrastructure requests (from one service to another in our service-oriented architecture) within a short timespan. When that could quickly scale to tens of millions of requests all coming in at the same time, this would overwhelm the statement service and crash it. We could provision enough servers to handle the heavy load on Monday, or use dynamic allocation. However, this would add more service dependencies and not solve the root problem of responding to an unpredictable and bursty load.
An Alternative Approach: Inversion
There is another simple strategy for dealing with this: if we are worried about too many clients calling in all at once: don’t let them. Make the service work by having the service initiate the dialogue, and make the clients purely reactive. Then clients can never DDoS the central service with requests. With this advice, the dialogue becomes:
    Service: I’m making weekly statements for last week. Do any of your 
             users want any?
     Client: Yes, I want statements for users uuid1, uuid2, …
    Service: Okay, what does user uuid1 need in its statement for last 
             week?
     Client: All this information …
     …
    Service: I have finished uuid1’s statement, and here it is.
     Client: Thanks!
With this design, our system would no longer have a case of the Mondays when the service gets clobbered with requests. Instead, the service calls out to each client in an orderly fashion and collects the information it needs, at a rate it can handle. Since the statement service itself now controls all RPCs, it will never take itself down by trying to do too much at once. But now the problem is how to make the clients respond to the service’s requests and act out this dialogue we’ve scripted.
A Thrifty Solution
Since all services in Uber speak Thrift, we use TChannel built with lightweight Thrift integration, as our network topology for service discoverability and routing. A service announces itself to the network and is identified by its name (e.g., “ledger” or “territories”), and it can run any number of Thrift services.
We took advantage of this by defining a Thrift service contract that all clients of the statement service must support. If they can respond to just these three RPCs, they can get statements generated for them. 
Each RPC is just an adaptation of one of those lines in the server–client dialogue:
Normally standing up a new Thrift service is a substantial task, but with TChannel it takes just one line of code for a client to add another Thrift service. Also, these RPCs are all simple requests for information, so they can be implemented statelessly and without side effects. Service owners can implement StatementService without fear of corrupting their own data or processes. Uber’s aggressive shift to a SOA already had this problem solved.
The icing on the cake is using uConfig, Uber’s dynamic configuration solution (addressed in its own upcoming article). It allows you to put your configuration values (constants, whitelists, tunable parameters, etc.) in a separate service and to make changes to them in production without having to reboot the service that uses those constants. 
We use uConfig to store the whitelist of client service names, which means standing up a new client service requires no code changes in the statement service. If the UberEATS team want to send you a statement with pictures of all the food you ordered in the last week, they just need to implement the StatementClient service, make sure it’s running, and then add the name of their “uber-eats-service” as a client of the statement service.
Where We’re Headed
The statement service design is an example of how even seemingly simple services have to be built defensively to survive Uber’s scale and speed of growth. It also shows how Uber’s SOA principles have put all the tools in place to make implementing those designs easy. 
Normally, implementing a service contract design would be impractical because every client service could potentially run different languages and frameworks that all need special handling. But with Uber dedicated to Thrift and TChannel, this design became just a natural application of their principles. And the best part is, the key components of our solution are all open source.