Uber has made a public commitment to phase out carbon emissions in the United States, Canada, and Europe by 2030, and worldwide by 2040. We maintain periodic updates on our progress via our Climate Assessment and Performance Report, which shows both how far we’ve come and how far we have yet to go.
Underlying this report is a large effort to prepare the data presented within, everything from identifying vehicle fuel type across markets, to carbon emissions per vehicle-mile and ultimately to passenger-mile traveled. In this blog post, we’ll take a closer look at the complexities of identifying “green” vehicles onboarded to Uber, and our solutions for managing those data.
What are Green Vehicles?
“Green” is an alluringly simple word that we encounter every day as consumers, guiding us towards environmentally conscious choices. At Uber, “green” has a specific meaning when it comes to vehicle identification, which we use for both our carbon emissions calculations and vehicle eligibility for products such as Uber Green and Uber Comfort Electric. Establishing a clear definition gives us internal clarity on our progress to reduce carbon emissions and allows us to stand firmly behind our lower-emission product offerings.
There are a wide variety of vehicle fuel types, from the internal combustion engine and battery-electric vehicles dominating modern news cycles, to more exotic fuel systems such as compressed natural gas and liquid petroleum aftermarket modifications. If you look hard enough, you may even find wood-burning vehicles, though we have yet to find one serving rides on Uber.
For the majority of our use cases, “green” vehicles on Uber include primarily battery-electric vehicles (BEVs) and hybrids (including plug-in hybrids). Hydrogen fuel cell vehicles are also zero-tailpipe-emission vehicles, though they represent a very small slice of vehicles active on our platform.
How are Green Vehicles Used Within Uber’s Products?
Uber offers a variety of ride options across our thousands of markets. As the vehicle mix, local regulations, and vehicle infrastructure vary across those markets, so do the products we offer. In some markets, if you request Uber Green, you’ll get a ride in either a hybrid or an electric vehicle. In others, Uber Green is 100% electric. Uber Comfort Electric promises a zero-emission ride in style, and it’s growing rapidly in the US and Canada.
Why is Identifying Green Vehicles Challenging?
While the need to accurately identify vehicle fuel types is clear, each vehicle’s fuel type itself is, unfortunately, anything but. A closer look at challenges in accurate identification follow.
Vehicle owners are required to input information about their vehicle when onboarding onto Uber. This typically entails uploading photos of registration documents or similar. This process varies across markets, however, ranging from manually intensive to heavily automated, and errors may appear in the process. There is also risk of fraudulent vehicle information, entered in order to obtain vehicle-specific incentives such as the Zero Emissions incentive.
In addition to errors, missing information adds to the challenge. The most accurate vehicle data comes via vehicle identification numbers (VINs). However, not all registration docs in all jurisdictions provide VINs, nor contain enough information to accurately identify if a vehicle is zero- or low-emissions. Uber’s onboarding processes vary significantly by region, and also across our Mobility and Delivery businesses. Vehicle form factor also matters: for example, two-wheelers such as bicycles and e-scooters do not even have VINs.
Vehicle Data: Sourcing and Standards
We start with a make-model-year (MMY) based approach. This gives us the broadest coverage across manufacturers (OEMs) and geographic regions. However, there are a number of MMYs for which different trims exist across electric, hybrid, and internal combustion engines (ICE), such as Kia Niro and Hyundai Ioniq. Also, make-models can evolve, adding or removing trims over time. Therefore, we use other approaches as needed, including sourcing vehicle data per VIN.
When we are able to identify VINs, we turn to third-party vendors to source vehicle data from the VIN. These vendors provide the service of collecting and cleaning data from OEMs and government databases, making the data available to customers in a standardized format. While this makes sourcing vehicle data simpler for customers, there is of course an associated dollar cost. Also, vendors typically focus their coverage on specific geographic regions, and vendor and data quality varies across regions, requiring a patchwork of vendors for the global coverage Uber requires.
An additional challenge is that OEMs implement the VIN spec differently (see #6 here for more details), and are sometimes even internally inconsistent across regions (for instance, OEM “A” uses VINs differently in Germany, Spain, and Australia). VIN data vendors try to normalize as best they can but this ultimately makes the entire process more complex, including for end users like Uber.
Scale and Timeliness
If the above challenges are not daunting enough, imagine solving them at Uber scale. With millions of vehicles registered on the platform, and hundreds of thousands more onboarding every month, no manual solution can suffice. Also, despite how they may appear in the real world, in Uber’s databases vehicles are not static entities–their data are subject to change as more docs are uploaded or processed, associations with drivers or fleets change, and as data issues are identified and corrected.
Resiliency at scale becomes key to a working solution–“green” identification must respond to changes in a timely fashion. Drivers of eligible vehicles expect to be able to take trips for Green or Comfort Electric immediately after they onboard, regardless of the internals of our data processing. In some markets, incentives are paid to drivers of low-emission vehicles, attaching a real dollar cost in driver earnings for which we bear direct responsibility.
All of the difficulties described above illustrate the need for a responsive, flexible, and modular architecture that can operate at scale. In the rest of the article, we’ll take a closer look at how we have solved for these challenges.
Data Sources to Identify Engine and Fuel Type
Numerous methods are available to identify the engine and fuel type of a vehicle, and we picked the following three to start with and plugged them into the architecture:
- Registration document engine info transcription
- VIN (Vehicle Identification Number) lookup
- Make/Model/Year (MMY) inference
The sources are processed in order of pre-defined precedence. If one of the sources is unavailable or restricted in a region, then it’s skipped and identification falls back to remaining sources. Documents get the highest priority, followed by VINs and MMY. Each of the data sources is described in greater detail below.
A VIN (Vehicle Identification Number) is a unique 17-digit code that is assigned to every vehicle. It serves as a unique identifier for the vehicle and provides information about its make, model, and features. The 17 digits of the VIN contain information about the vehicle’s manufacturer, model, body type, engine type, transmission, and other specifications.
VIN decoders are online tools that can be used to decode and provide detailed information about a vehicle’s history and specifications using its unique VIN. To be able to do the lookup and get the engine and fuel type, we established partnerships with multiple VIN decoders across the globe.
Vehicle registration documents in certain markets (e.g. France) list the engine information on the document. This presents a consistent way of identifying low-emission vehicles in these markets. During transcription of the document at the time of vehicle onboarding; engine information can be parsed by the agents and made available downstream for engine identification.
We try to make documents config scalable to use by implementing fx Value Groups. The idea is to have each country’s parser be in a separate file/package and export them all into the same value group. This enables us to add more document configs in the future with minimal effort.
Make Model Year Classifier
All Teslas are electric. All Toyota Prius are hybrid. To capture such combinations we propose a rule engine based on Make, Model, and Year (MMY) for classifying vehicle models’ engine/fuel. The only required field in the rule definition is Make and Fuel Type. Granular rules (with Model, Model Year defined) take precedence over such generic rules. First a match on all three fields is performed; if a match is not obtained, criteria is relaxed to only based on make and model and finally just on the make. Allowing null values for Start and End Year lets us define rules such as “All Toyota Prius till 2019,” “All Toyota Prius from 2022 onwards,” or “All Toyota Prius between 2019 and 2022.”
All rules also have an associated location scope, defined by a pair of values: Scope and ID. Scope can be one of “Global,” “Mega Region,” “Country,” “City” and the structure is described in the table below.
|Location scope||ID (Example)||Description|
|Mega Region||EMEA||Region shorthand ID|
Combining the MMY rule with location scopes gives us the following definition structure:
|Model||String | NULL||Model Y|
|Start Year||Integer | NULL||2016|
|End Year||Integer | NULL||2022|
|Location||String | NULL||USA|
The diagram below presents an overview of the various system components that comprise the Green Vehicle Identification (GVID) architecture.
Instantaneous Identification Using Event-Based Architecture
The main intent is to be able to identify vehicle types as soon as they’re onboarded, and when any updates (VIN number, Make or Model info, etc.) are made to the vehicle information. Vehicle onboarding takes place in multiple steps and information about the vehicle is added during these stages from various sources including documents such as registration and insurance. During instantiation of the vehicle, engine identification may be indeterminate due to limited information. As more information becomes available when vehicle documents are processed, we listen for these events and on their receipt kick off the identification process to have an instantaneous identification process.
Irrespective of the event that kicked off identification, the core logic loops through available sources and picks the first available match. The priority order of the sources are configurable per region. Currently the priority order globally looks like below, but we may update it in the future based on business needs.
Document Lookup > VIN Lookup > Make Model Year lookup
(Top Priority) (Fallback)
The identification logic is also decoupled from Operations workflows (Flow Automation) that add appropriate labels to vehicles. This is so we can enable Operations teams regionally to enroll vehicles to multiple products and offerings without any engineering intervention.
Since the system processes multiple event types (vehicle events, document change events, etc.), it can lead to multiple such events triggering simultaneously. We aggregate and de-duplicate events using Cadence Signals to reduce the processing cost. Further, to ensure all events are processed successfully and to make the system resilient to downtime of downstream services, we leverage the exponential backoff algorithm using Cadence Workflow Retries.
Centralized VIN Store for Housing Classification Results
VIN Store is an internal service that captures and standardizes VIN responses from various vendors that Uber partners with. It follows an Adapter design pattern to make it easy to onboard new vendors and capture additional attributes from existing vendors. It also houses logic to impute some data (e.g., emissions per mile) using existing attributes.
There are two main reasons why we need the introduction of a VIN Store service. Firstly, the cost of VIN lookups can be quite high when performed through third-party vendors. Since we utilize multiple services to perform VIN lookups, the expense can add up quickly. By caching and reusing the results from a single lookup, we can reduce the number of VIN lookups it needs to perform and save on costs. Moreover, caching the results can speed up the VIN lookup process, since the information is already stored and readily available.
Secondly, the information obtained from various vendors providing VIN lookups is often not standardized. By standardizing the information across various sources using a VIN Store service, we can ensure that it is working with reliable and consistent data. Furthermore, standardizing the data can enable better decision-making and analytics, which can ultimately lead to more seamless product experience. The VIN Store service can also simplify the integration of data from different sources, making it easier for us to access and analyze the information it needs to make informed decisions.
Labeling Vehicles Using Flow Automation
The eligibility of Uber products that a driver-vehicle combination is permitted to take is configured using labels that are applied to vehicles. Labels can be thought of as contracts that exist with other Uber systems to standardize the product definition. Labels need to be accessible from our APIs to be usable within the online context of a vehicle going online and configuring the correct trip types it can accept. Thus labels need to be stored in our online storage layer.
Previously, the labeling process to encode region-specific business rules required careful, manual effort, in order to ensure oversight of the process. It was enabled using Flow–Uber’s customer automation platform–and was run periodically by the operations team. With the new process, we continue to leverage Flow but have automated it to run as soon as engine identification completes and fuel type identification results have been saved in the VIN Store. For any vehicles that are unable to be tagged by the workflow, the unprocessed events in the Dead Letter Queue (DLQ) are monitored and resolved manually to ensure that no eligible vehicles are left out due to any technical or operational issues.
Faster and More Resilient Processing
Before the switch to an event-driven architecture, the process of identifying a vehicle could take up to 72 hours due to the latency involved in waiting for all the required data to be available in upstream tables. This meant that drivers who were onboarding low-emission vehicles had to wait for the process to complete before vehicles’ engine type was identified.
However, with the new architecture, the identification process is triggered immediately upon the processing of the vehicle document, resulting in the process completing in a matter of minutes. The process has a p90 SLA of 5 minutes and a p99.5 SLA of 1 hour. This greatly reduces the waiting time for drivers and allows them to become eligible for low-emission products and the Zero Emissions incentive much faster than before.The instantaneous identification has led to millions of dollars of additional earnings being unlocked for the earners.
Moreover, event processing isolates the processing failures to individual vehicles, which makes the overall identification process more robust and efficient. Previously, if the processing pipeline failed, the risk of larger lags was higher as it needed to be re-triggered. However, with event processing, the failures are isolated to individual vehicles, reducing the blast radius and thus minimizing the risk of delays for the rest of the vehicles. Plus the system has alerts set up to page on call if the system is experiencing degraded performance and elevated errors, leading to earlier detection and mitigation of issues improving overall reliability greatly.
Reduction in Toil and Single Source of Truth
The automation of labeling of vehicles to make them eligible for low-emission products and incentives has led to elimination of manual toil associated with execution and upkeep of the workflow. Multiple hours of effort was being spent every week to run the workflows, monitor them and also resolve any issues resulting due to failures.
Additionally, the architecture enables us to have a single source of truth that contains vehicle specs and engine information. The standardized and accessible dataset obtained from this process can be used for all low-emission vehicle use cases setting us up for exciting new product developments.
GVID for Delivery Vehicles
Our initial focus for GVID was to identify vehicle fuel type for rider trips. We don’t yet have a lower-emissions option for Delivery, though we’re working toward products for it. Accurate vehicle fuel type identification is key to these efforts but presents a unique set of challenges. Many delivery vehicles, such as bikes and e-scooters, do not have a VIN, and for vehicles that do have VIN, vehicle documents may be optional based on the city, resulting in a limited amount of information available for identification purposes. To overcome this challenge, we are exploring alternative sources of information such as license plates lookups, machine learning models and potentially asking the driver or courier to input the vehicle’s fuel type during vehicle onboarding.
Expand Geo-Coverage and Incorporate Additional Methodologies
Moreover, our future plans involve broadening the geographical reach of our identification process to include other regions such as EMEA, APAC, ANZ, and LatAm. VINs are a reliable source of identification in the US&C region because vehicle manufacturers adhere strictly to the VIN standard, and VINs are commonly present on vehicle documents. However, in other regions, the coverage of VINs may be limited, reducing their reliability for identification purposes. In such cases, alternative sources of information, such as license plate lookups and registration documents, which encode engine types, can be leveraged to augment our identification process. By incorporating additional sources of information, we can improve the accuracy and reliability of our identification process across regions including US&C.
Green Vehicle Identification is a big step in Uber’s mission to phase out carbon emissions on its platform. It is a challenging problem to solve due to regional variations in vehicle onboarding process, limited access to vehicle data, and the global scale of the business. To address these challenges, we developed a scalable architecture that can leverage multiple data sources to determine a vehicle’s engine and fuel type.
This system now powers our low-emission ride options in major markets and has set the foundation for future expansion. With the instantaneous identification and labeling process, we have eliminated hundreds of hours of manual effort and provided earners with access to low-emission products and incentives, unlocking additional earnings potential. As we bring lower-emissions options for Delivery products, we intend to expand the system to onboard vehicles serving those products while continuing geographic expansion for Mobility products.
Disney Pixar Cars toys at the Mattel booth at the D23 Expo” by Castles, Capes & Clones is licensed under CC BY 2.0
Ankur Gupta leads the Vehicles Platform team at Uber that manages vehicles catalog and metadata and is building Connected Vehicles platform to power next-generation experiences for earners and riders. His previous work includes optimizing Uber’s infrastructure stack by reducing the cost of capacity and improving resource efficiency.
Eric Socolofsky manages Engineering for Uber’s Sustainability Tech team, which builds user-facing products that enable Uber’s users to go anywhere and get anything with zero carbon emissions. His prior work spans data visualization, public education, and climate action.
Zubin Thampi is an engineer on Uber’s Sustainability Tech team, and works on initiatives on Rides as well as Eats products. He’s most passionate about plastic reduction and cyclicality in society. Outside of work, he spends time playing music and exploring the beautiful nature in Colorado.
CheckEnv: Fast Detection of RPC Calls Between Environments Powered by Graphs
13 September / Global
Selective Column Reduction for DataLake Storage Cost Efficiency
Risk Entity Watch – Using Anomaly Detection to Fight Fraud
Fast Copy-On-Write within Apache Parquet for Data Lakehouse ACID Upserts
Attribute-Based Access Control at Uber