
Reliable transportation requires a robust map stack that provides services like routing, navigation instructions, and ETA calculation. Errors in map data can significantly impact services, leading to a suboptimal user experience. Uber engineers use various sources of feedback to identify map errors, for instance, machine learning models to log and understand user feedback, or by evaluating map metrics to improve map quality.

In this article, we discuss another source of feedback: leveraging GPS traces to detect inconsistencies in map data. To demonstrate, in Figure 1, above, we compare two trips starting and ending in the same locations to show how map data errors can lead to substantially longer travel times.
To address this type of issue, we built CatchMapError (CatchME), a system that automatically catches errors in map data with anonymized GPS traces from the driver app. CatchME uses the anonymized and aggregated data from tens of millions of trips across large geographies to catch map data errors. With CatchME, our operators can quickly identify and fix these errors, leading to more accurate routes and improved driver-partner experiences on our platform.
Identifying map errors with GPS
The fundamental idea of CatchME is that Uber trip GPS traces reflect the ground truth. By analyzing the anomalies of road map matching, CatchME identifies the differences between maps and ground truth. These differences are usually caused by map data errors which can be addressed by updating the map.
CatchME’s first challenge is to find out if a driver’s navigation behavior, as recorded by trip GPS traces, shows disparity with our own suggested map routing. We designed CatchME to snap trip GPS traces on map data with the Hidden Markov Model (HMM), enabling the tool to report disparities between expected and actual routes. Instead of traditional map matching that assumes map data is valid, CatchME focuses on finding disparity without necessarily trusting map data.
GPS traces are not entirely accurate, especially in urban environments, so we don’t know the exact location of vehicles on the platform. We put vehicle location probabilities into HMM, and the Viterbi algorithm calculates the most likely sequence of road segments the vehicle drove through based on these traces. With this information, CatchME reports anomalies in traces to this sequence and highlights the difference between driver behavior and the app’s suggested route.
Figure 2, below, depicts an example of how GPS traces can highlight the inaccuracies in our map data. In this case, a route in San Francisco’s Golden Gate Park, (a) shows that a driver turned right at the intersection of 8th Ave and Fulton St, but the driver-partner deviated from (b) the suggested route:


In Figure 2, we find that there is a right turn restriction which discourages driver-partners on the platform from turning right. However, based on driver-partner behavior, we realize that this piece of information in the map may be inaccurate. CatchME visualizes the disparity between our suggested navigation and actual driver behavior, enabling operators to identify and fix the error.
Disparities between suggested routes and GPS traces are not necessarily due to map data errors. Figure 3, below, highlights two other potential causes for these disparities: (a) illegal or dangerous driver behavior and (b) noisy GPS traces that do not provide enough concrete data to clearly determine the route taken.


CatchME error detection algorithm
As discussed earlier, HMM is the bridge that connects trip GPS points with map data. Conceptually, the Viterbi algorithm calculates a path which includes the sequence of most likely states through all possible states in HMM. Ideally, state transitions in this sequence should have a high probability among all possible states. However, this sequence will still include state transitions with low probabilities if there are map data errors. In this context we refer to low probability between states in the sequence as abnormal probabilities.
Emission probabilities (EP) and transition probabilities (TP) will be put in HMM first. Emission probability represents the likelihood of a vehicle present on certain road segments at certain moments. Transition probability represents the likelihood of a vehicle moving from one road segment to another road segment over a certain duration. Hence, for one GPS point with m number of road segments nearby, there will be m emission probabilities representing the likelihood of this GPS trace on each road segment. For GPS points G1, which have m nearby segments, and G2, which has n nearby segments, there are m * n transition probabilities. These probabilities are in HMM, from which the Viterbi algorithm picks up a sequence of states with maximum probabilities that are most likely to represent road segments on which the vehicle was moving.
