
Introduction
Data is crucial for our products. Data analytics help us provide a frictionless experience to the people that use our services. It also enables our engineers, product managers, data analysts, and data scientists to make informed decisions. The impact of data analysis can be seen in every screen of our app: what is displayed on the home screen, the order in which products are shown, what relevant messages are shown to users, what is stopping users from taking rides or signing up, and so on.
With such a huge user base and wide range of features, support across all geographic regions is a complicated problem to solve. Furthermore, our app keeps expanding with new products, which mandates that the underlying tech also be flexible enough to evolve and support them.
Data is the primary tool enabling this. The following article will focus on rider data in particular: how we collect and process it, and how that has informed concrete improvements to the Rider app.
Rider Data
Rider data comprises all the rider’s interaction with the Uber Rider App. This accounts for billions of events from Uber’s online systems every day, which are in turn converted into hundreds of Apache Hive™ tables for different use cases, powering the Rider app.
These are some of the top problem areas which can make use of rider data analytics:
-
-
- Increasing funnel conversion
- Increasing user engagement
- Personalization
- User communication
-
Online Data Collection
Mobile Event Logging
Rider data has multiple sources, but a primary one is capturing how users interact with the App. User interaction is captured through event logging from mobile. The Logging architecture is designed around the following key principles:
-
-
- Standardization of logs
- Consistency across platforms (iOS, Android, Web)
- Respecting user privacy settings
- Optimizing network usage
- Reliable without degrading the user experience
-
Standardizing Logs
It’s important to have a standardized process for logging, since hundreds of engineers are involved in adding or editing events. Logs that are captured on the client are either platformized (e.g., events like user interactions with UI elements, impressions etc.), or added manually by the developers.
A default set of metadata is standardized and extracted out as a common payload, which is sent with every event, such as location, app version, device, screen name etc. This can be essential for formulating metrics at the back end.
Furthermore, to ensure that all events are consistent across platforms and have standardized metadata, we have defined thrift structs that need to be implemented by the event models to define its payload. Thrift schema include an enum representing the event ID on different platforms, and a payload struct defining all the data that needs to be associated with the event at the time of registration, and finally the event type.

Publishing Logs
These logs are piped into Unified Reporter, a framework within the client for ingesting all the messages that the client produces. The Unified Reporter then stores messages in a queue, aggregates them, and sends them over the wire in a batched fashion every few seconds to the backend Event Processor.

Events keep growing or changing—there are hundreds of types of events being processed today. Other problems of growing severity are platformization logs across different OS (Android and iOS), discoverability, and maintaining a good signal-to-noise ratio. The Event Manager portal is where metadata about these events are managed and appropriate sink for the events are chosen.
Based on the metadata, the Event Processor that receives the events decides how they need to be processed and propagated further. Event Processor also gates the events and doesn’t propagate the events downstream unless the metadata and mapping for that event are available. This is done in an effort to improve the signal-to-noise ratio.
Backend Event Logging
Along with user interactions, it is important to capture what is shown to the user in the app. This is done by logging data from the service layer at the backend. Backend logging handles more metadata, which is either not available for mobile or too much for a mobile phone to handle. Every backend call resulting from mobile or other systems logs the data. Each record logged has a ‘join’ key, with which it can be tied to the mobile interaction. This design also makes sure mobile bandwidth is efficiently used.