Risk Entity Watch – Using Anomaly Detection to Fight Fraud

September 28, 2023 / Global

Background

At its core, Uber operates marketplaces. With our Rides business, the two sides of the marketplace are drivers and riders. In Uber Eats, eaters, couriers, and restaurants spawn a three-sided marketplace. Unfortunately, fraud occurs on our platform, particularly when there’s money exchanged in a marketplace. This can range from customers requesting unwarranted refunds on Uber Eats (see Mitigating Risk in a Three-Sided Marketplace), to payment fraud, account takeover, driver-partner/rider collusion, promotion abuse, GPS spoofing, and more.

The Risk team at Uber steps in to mitigate these fraud and abuse problems. Throughout the years, our Risk team has built many systems to fight fraud, including our rules engine, Mastermind, and numerous machine learning models leveraging our graph database of knowledge.

Supervised methods allowed us to scale our fraud detection models to dozens of different use cases with many different label types (payment chargeback, refund abuse, GPS spoofing). Although helping Uber to scale machine learning for fraud across many use cases, continuing to operate only supervised methods for identifying fraudulent activity can quickly become unmanageable to deal with once the classes of fraud expand as the lines of business grow. In such cases, turning to machine learning-enabled analysis of unlabeled data sets (unsupervised machine learning), where appropriate and permissible, has helped Uber properly manage the ever-growing set of fraud classes in order for our business to continue to grow sustainably.

Risk Entity Watch is our in-house platform that uses unsupervised machine learning modeling for detecting and flagging entities that are potentially engaging in nefarious activities across the Uber platform, the results of which are then individually reviewed by agents.

Events and Entities in Fraud Detection

Our fraud detection system typically checks on requests or events coming into the Uber ecosystem. The system assesses the risk of a particular ride request, food order, or payout request. Each request event involves the entities engaged in the particular transaction and impacts these entities’ feature values.

For example, the event of a trip in the system affects the feature number_of_trips associated with the entity of a rider, a driver, or a payment instrument.

Events capture the history of entities, and entity features feed into event risk assessment.

In the example above, with just the single trip request event, at least 3 entities are involved. The 3 illustrated entities here include:

Rider
Driver
Payment method

In practice, the list of entities involved in the event expands much further–from email addresses to cities, to the driver’s or rider’s device information, and more.

Introducing Risk Entity Watch

Risk Entity Watch is a platform for developing and hosting anomaly detection models. It offers the following features:

Templated pipeline to train anomaly models and flag anomalous entities
Python DSL configuration files for feature engineering and model parameters
Feature explainability for explaining reasons why entities are flagged
Model debugging and auto-tuning capabilities
Support for distributed training for large datasets

Feature Engineering

Entity Feature Generation (EFG)

With so many entities and events available, starting feature engineering from scratch can pose a challenge to data scientists putting up a model on Risk Entity Watch. To help alleviate the problems posed by feature and model engineering, we employ a generic solution for entity feature generation across events. Our solution involves providing a baseline set of metrics to compute for every entity involved in a particular event. The following diagram illustrates the Entity Feature Generation solution.

Figure 3: Entity Feature Generation Pipeline

The Entity Feature Generation solution utilizes the following constructs to create features:

Entities
Metrics
Time Windows
Events

For every event, a set of entities are present, along with metrics for what happened after the event took place. Metrics are computed over events, such as counting the number of completed trips for a given entity.

This structure allows data scientists to define a metric once and have it computed for all entities and time windows automatically. Similarly, for adding a new entity, the default set of metrics will be available for that entity.

Then, for a particular event, the set of features made available adds to:

Figure 4: A calculation on the number of features created for a particular checkpoint

Where:

a = the number of aggregation windows (6 hours, 7 days, 30 days, etc.)

e = the number of entities available in the event

m = the number of metrics defined for that event

Furthermore, we enable this entity feature generation across many types of events, or checkpoints. Uber has over 50 events that flow through the risk platform, such as trip requests, user logins, new user sign ups, or gift card purchases. Each of these checkpoints has their own unique sets of entities (for example, new users signing up don’t need a payment to sign up, so that entity isn’t available in those events)

This expands the set of entities available to Data Scientists and ML engineers further by adding another factor–the number of events (or checkpoints) to aggregate over:

Figure 5: A calculation on the number of features created across all checkpoints

Assuming some hypothetical numbers of:

Figure 6: A hypothetical but realistic set of parameters

We can easily generate over thousands of ‘entity-specific’ features for our fraud detection use cases. Now, this makes our feature space almost too large, which in some sense for us is a good problem to have, since it gives data scientists a launching-off point where they can select the features they need based on the problem domain they are looking to solve.

Problem-Specific Feature Engineering

We must handle feature engineering differently for entity-level anomaly detection than for event-level supervised models.

There is much more variation in entity-level features in comparison to event-level features.

For example, the length of a trip can vary by one order of magnitude. The mileage traveled by a particular rider in a year can easily vary drastically by three orders of magnitude. While there is high variance, such behavior can be normal and does not mean fraud.

When the goal is to highlight the difference between normal and abnormal, such variation requires much more sophisticated baselines and distribution shift management. Each entity needs to be examined in the context of time, making the training dataset driven by a composite key-entity and time.

Within Risk Entity Watch, we instituted support for time-series analysis in our entity-level feature engineering to normalize each entity in the context of time. As we generate these features, we can then compare them across entities to find anomalies.

Risk entity watch offers algorithms that support feature debugging. Model development is a process of continuous improvement of separating signal from noise. Many records can appear anomalous, but not relevant to the business case. The goal of feature engineering is to find the set of features that lead to business-relevant anomalies.

Our feature importance algorithms are used during the development process to highlight the noisy features and provide the path to noise reduction.

Anomaly Detection

Unsupervised anomaly detection methods aim to separate anomalies from the normal population. It’s difficult to determine what is normal when different participants naturally use the platform differently. Part of this work is addressed in feature engineering when we normalize for the entities. This still leaves the entities which behave differently, but form multiple clusters of similarity.

The system is built on top of Uber Michelangelo and offers multiple anomaly detection algorithms ranging from tree-based to neural net-based algorithms.

Anomaly Explanation

Explainability is critical, especially for fraud management. We believe fighting fraud is the right thing to do and what our stakeholders, including our community of users, expects of us. Correct and timely identification and mitigation of fraud can result in taking prompt action and result in denial of service/income etc. With this potential impact in mind, we design our workflows around human review: an agent being able to review flags, understand, and assess them, prior to making a determination that–based on the information reviewed–there’s sufficient indication of fraud to take action on a user’s account.

Given that the anomalies detected are typically first sent to our operations teams for manual review, providing feature importance analysis on the items flagged by our fraud models is one of the most important feature requests from our operations teams. We’ll talk about this more when we talk about the HAIFA algorithm below. Without explainable results, it can be challenging for a manual review agent to understand the results of an anomaly detection algorithm and assess whether the action is warranted.

As we considered incorporating explainability within Risk Entity Watch, we realized we needed an algorithm that would satisfy the following requirements:

Supports streaming inference with minimal computational impact
Is simple enough to understand
Explains each anomalous observation separately

We believe following this set of requirements allows us to support the agents as they determine the outcome of the fraud assessment in accordance with our internal policies and legal compliance requirements.

HAIFA Algorithm

Intuition

Anomalies refer to observations that stand out as distinct from the majority. The algorithm for detecting these anomalies identifies those observations that have minimal counterparts in the multi-dimensional space made up of various observation features.

Keeping this concept in perspective, the explanation of an anomaly can be conceptualized as identifying the dimensions in which the observed data significantly deviates from the norm.

Algorithm

We present HAIFA (Histogram Analysis of Important Features for Anomalies):
We take fine-grained histograms of each feature distribution to find if a particular observation is isolated in a particular direction. Each observation is assessed against the histograms, and we find the bucket size of each observation’s feature value.

Figure 8: HAIFA histograms X (left) & Figure 9: HAIFA histograms Y (right)

The features important for a particular anomalous observation are the features where the values fit in small buckets, signaling not many other observations have feature values that fall within the current observation’s feature value. The size of the bucket is determined by the bucket threshold parameter, which can be set as a hyperparameter of determined automatically using autotuning.

As part of serving, we locate the features corresponding to small buckets. The bucket threshold is a hyperparameter, corresponding to the maximum proportion of the data that goes to the bucket.

Autotuning

One of the initial issues we had with HAIFA was that not all anomalies had at least one important feature identified for them. The main reason for this is that the ML engineer would set the bucket threshold manually. A bucket threshold that is too high will identify too many features as important, leading to the features identified having no value. On the other hand, a bucket threshold that is too low won’t identify important features at all for some points.

While testing different bucket thresholds, we found a correlation between the anomaly and bucket thresholds. As the anomaly threshold for a model increases, the bucket threshold decreases. As a row is more anomalous, the feature(s) that makes it anomalous are easier to identify.

Figure 10: Negative correlation observed between anomaly and bucket thresholds

We finally implemented an automated bucket threshold calculation so the ML engineer doesn’t have to input their own.

Our goal is to identify the smallest bucket proportion threshold such that for every outlier identified by the anomaly detection algorithm, there exists at least 1 feature such that the count of all the normal observations in the bucket which the outlier falls within (a.k.a., the bucket_size) is less than the found bucket proportion of the total number of observations.

This bucket proportion threshold is calculated through binary search. Our binary search finds the minimum threshold, such that all anomalous points have at least one important feature identified. This way, all anomalies have some explanation features that manual reviewers can look over. However, if the ML Engineer still wants to use their own bucket threshold for experimentation, they are able to.

ML Engineer Workflow

One of the biggest advantages of Risk Entity Watch is how quickly and easily a Machine Learning engineer can get a model trained and deployed in production. After a training set is created and defined in Apache Hive^™, all remaining tasks can be done in one Python DSL script.

Model Tuning

The Risk Entity Watch platform is a specialized tool designed to facilitate machine learning engineers in the intricate task of experimenting interactively with hyperparameters, features, and datasets. With the provision of pre-generated EFG features, the platform efficiently enables the execution of diverse experiments, thereby leading to the procurement of meaningful results.

Handling the distraction caused by irrelevant detections presents a considerable obstacle in anomaly detection. However, Risk Entity Watch assists in identifying the observations and features contributing to excessive noise.

HAIFA feature importance algorithm serves a dual role in our processes. First the ML Engineer uses it to choose the features that will go to the model. It also helps to discover potential data quality issues affecting the detection power of the features. Once the model features are selected and the model is tuned, HAIFA is used as a mechanism to explain the anomalies flagged by the production system.

We’ve devised a range of standardized plots, which can expedite the process of evaluating the model. These graphics can be generated using the plotting commands from the Python DSL. This allows for a swift assessment of the spread of anomaly reasons and the relationship between a feature’s value and the anomaly score.

Figure 12: Model Tuning – (Left) Pie Chart of anomaly reasons & (Right) Scatter plot of anomaly score and feature values

The ML Engineer has the ability to scrutinize the features that trigger the detection of the majority of anomalies and adjust the features that generate the most noise. This explanatory analysis also helps to discover and resolve potential data quality problems, which might affect the detection power of the features. The prioritization of these features is determined by the aggregate results of feature importance derived by the HAIFA algorithm. The features which contribute to anomaly detection the most are scrutinized for noise level coming from this specific feature.

This process sets up a fast cycle of continued model improvement.

Conclusion

Risk Entity Watch is a platform that provides a developer-friendly approach to creating anomaly detection models, currently being utilized to help spot fraudulent entities within Uber’s business landscape. It allows for a versatile method to design and initiate an anomaly detection model, thereby aiding in fraud prevention. Each anomaly is assigned a score indicating its priority and is further clarified with significant features. This flexibility empowers quick adaptability, which is crucial to keep pace with the rapidly evolving business environment at Uber.

Acknowledgments

We would also like to thank Yun Zhang, Chengliang Yang, Long Sun, William Michalak, and Nyle Ashraf, Qifa Ke from our Risk Applied Machine Learning Teams for their guidance and for being evangelist users of the Risk Entity Watch Platform.

Apache®, Apache Hive, Hive, are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.

Sergey Zelvenskiy

Sergey Zelvenskiy is a Lead Engineer on the Risk Entity Watch project with the Uber Risk Machine Learning Engineering team based in Sunnyvale, CA. Sergey loves perfecting his pour-over coffee techniques and exploring vibrant cities all over the world.

Becky Hui

Becky Hui is a Machine Learning Engineer on the Risk team and builds real time solutions to combat fraud in the marketplace. In her spare time, Becky enjoys building Lego sets and traveling the world.

Sahana Noru

Sahana Noru is currently a sophomore at UC Berkeley specializing in Computer Science. She interned with the Risk Applied ML Team at Uber during Summer 2022 and worked on the Risk Entity Watch Platform. During her free time, she enjoys reading and cooking different desserts.

Christopher Settles

Christopher Settles is a former Machine Learning Engineer on Uber’s AI Platform - Feature Store team, where he leads both Uber’s Data for Realtime ML initiatives and Uber’s GenAI on complex documents initiatives.

Posted by Sergey Zelvenskiy, Becky Hui, Sahana Noru, Christopher Settles

Category:

Engineering

Data / ML

Uber AI