April 16, 2026

Next-Gen Restaurant Recommendation with Generative Modeling and Real-Time Features

Yicheng Chen

Staff ML Engineer

Peng Chen

Senior Staff ML Engineer

Nikat Patel

Senior Machine Learning Engineer

Share this article

Introduction

The homefeed is the primary gateway for millions of Uber Eats users worldwide, serving as the central hub where hunger meets discovery. For people using Uber Eats, a well-optimized feed reduces cognitive load and helps them find their next meal with ease. For merchants, it’s a critical platform for visibility and growth. As the scale of our offerings grows—expanding from local favorites to grocery, alcohol, and retail—the homefeed’s role as a personalized discovery engine becomes even more vital to the overall experience.

Powering this experience is our recommendation model, an intelligent layer that synthesizes billions of signals—from real-time behavioral cues to geographic context—to rank the optimal options for every session. To keep pace with this increasing complexity and scale, we’ve fundamentally overhauled our modeling architecture. This blog explores how we modernized the Uber Eats feed recommendation system using behavioral sequences, transformer architectures, and near-real-time features, while establishing a roadmap toward generative recommendation models.

Food delivery app interface displaying restaurant and grocery options, special offers, and categories like pizza, burger, healthy, and grocery. Various restaurants and stores are shown with ratings, delivery times, and promotional deals such as free delivery and buy one get one free. — Figure 1: Uber Eats home feed experience (mobile app).

From Statistics-Based Features to User Behavioral Sequence Features

For years, our homefeed DeepCVR model relied primarily on aggregate statistics and hand-crafted features to predict Uber Eats user affinity for a given merchant. While effective at a baseline level, these static features often failed to capture the nuanced, chronological context of an Uber Eats user’s journey. To overcome this, we’ve evolved the architecture into a hybrid model that combines the strengths of traditional DLRM (Deep Learning Recommendation Models) with modern sequence modeling.

The Hybrid Approach

The core of our new architecture is a dual-path system. One path maintains the DCNv2-based model to process traditional scalar and categorical features, while the second path introduces a transformer-based sequence encoder.

DLRM/DCNv2 path. This path continues to handle the high-dimensional sparse features and dense statistics that represent the steady-state preferences of Uber Eats users and the characteristics of merchants.
Sequence path. We ingest a chronological log of Uber Eats user actions—including clicks and orders—and process them through multi-head self-attention layers. This allows the model to capture fine-grained temporal dependencies and better understand an Uber Eats user’s evolving intent over time.

Target-Aware Sequence Modeling

A key aspect in this hybrid setup is target-aware training. Instead of encoding the Uber Eats user sequence in isolation, we append the target store (the merchant we’re currently scoring) to the sequence. This allows the transformer to compute the direct relationship between past behavior and the specific candidate merchant, a technique inspired by industry benchmarks like DIN and BST.

Integration and Scaling

The output of the transformer is a dense sequence embedding that represents the Uber Eats user’s historical context in relation to the target. We then extract this embedding and append it to the feature vector of the main DeepCVR model. This hybrid design not only improves recommendation relevance but also serves as the foundation for our goal of scaling the model size.

Three neural network architectures for multi-task learning: (1) DCNv2 network using numerical and categorical feature embeddings to predict order and click probabilities; (2) Multi-head self-attention layers embedding and concatenating store ID, cuisine, and time of day features; (3) DCNv2 network generating feature representations for a multi-task MLP to predict order and click probabilities, with target and feature representations highlighted. — Figure 2: Comparison of DLRM architecture and the hybrid architecture.

From Batch-Processed to Near-Real-Time Features

Previously, our recommendation models relied on batch features computed by offline jobs, resulting in a 24-hour or longer lag in feature computation. This meant our models were often unaware of an Uber Eats user’s most recent actions.

To bridge this freshness gap, we built on the Next Personalization Platform—Uber’s internal platform for powering personalized experiences across our apps. At the core of this platform is UserContext, a near real-time, cross-line-of-business, event-sourced history of user actions. This architecture allows us to compute features on the fly from an Uber Eats user’s sequence of past actions, rather than relying on static, pre-materialized aggregate data.

The features are computed using FeatureExtractors, which are pure Java functions invoked by the online Feature Store service. For training data generation, we use an Apache Spark™ job to reconstruct the UserContext at past inference timestamps and invoke the same FeatureExtractors to generate the required features. This guarantees that the features used for training are identical to those computed during live inference. This addresses the critical engineering challenge of ensuring online-offline parity to prevent training-serving skew. We employ continuous monitoring via sampled feature logging, comparing live outputs against offline re-computations to ensure our feature consistency.

We have now reduced the data lag from days to a few seconds, enabling the model to incorporate an Uber Eats user’s most recent interactions within the same session. This shift has proven particularly transformative for our most challenging user segments, such as cold-start users with little to no historical data on the platform.

A data pipeline diagram comparing Default (Online Mode) and Cold Start (Backfill Model) processes. On the left, curated topics flow from Store Clicks Kafka through a Next Ingestor Service and Store Click Processor, enriched by an eats-store service, and stored in a UserContext Docstore. On the right, Hive tables for curated topics flow from Store Clicks (Hive) through a Next Replay Job and Store Click Processor, enriched by Eats Store (Hive), and stored in a UserContext Backfill Docstore. Both pipelines merge into a Unified UserContext. — Figure 3: Architecture of Next Personalisation Platform to build UserContext.

From a Discriminative Model to a Generative Model

The introduction of sequence features and the hybrid architecture represented a major leap in our ability to capture Uber Eats user interest. However, this hybrid approach still functioned within a traditional discriminative, pointwise framework. That is, each inference call required a forward pass for a single Uber-Eats-user:store pair, and the model remained a dual-path system where traditional DLRM components and transformer encoders operated in parallel before their outputs were merged.

To truly scale our recommendation capabilities, we’re shifting from this hybrid design to a transformer-centered architecture, paving the way for a fully GenRec (Generative Recommender) paradigm.

Moving to a Transformer-Centered Architecture

While the hybrid model successfully integrated user behavioral sequences, it still relied on a DCNv2-based path to handle high-dimensional sparse and dense statistics. In our new architecture, we move away from this parallel structure. Instead, the transformer encoder becomes the primary trunk of the model. Traditional non-sequence features are no longer processed in a separate DLRM path; instead, they are transformed and concatenated to target token features before entering the transformer, serving as enriched feature representation for the sequence module.

From Pointwise to Listwise Parallelism

A fundamental limitation of the previous model was its pointwise nature—where it’d only predict a score for one target store at a time. Our modernized architecture evolves this into a listwise approach where the model takes an array of candidate stores in the same session as an input. This allows the model to generate scores for an entire list of merchants in a single forward pass, significantly improving training and serving efficiency by reducing the complexity per store to roughly 1/T of the original model (where T is the number of target stores).

Neural network architecture diagram for a recommendation system. Input features include store ID, cuisine, time of day, numerical features, and categorical features. These are embedded and concatenated, then processed by pointwise MLPs (with a separate target MLP for the target item). Outputs are passed through multi-head self-attention layers, followed by a multi-task MLP that predicts probabilities for order and click actions. — Figure 4: Generative recommender architecture.

ML System Optimization

To harness the benefits of generative modeling and near-real-time features without compromising system stability, we also evolved our ML ecosystem. By refining our training and serving infrastructure, we’ve built a robust foundation for more sophisticated recommendation architectures.

Training Efficiency

We migrated from Keras/TensorFlow™ to PyTorch™ v2 to gain the flexibility needed for advanced sequence modeling architectures. To further accelerate training, we implemented multi-hash embeddings (which resulted in a material throughput improvement) and BF16 mixed-precision training (which resulted in an additional notable speedup). These optimizations allow us to train larger models while maintaining training times comparable to our previous baseline.

Reliable Serving

We transitioned from on-the-fly model conversion to a modular offline optimization pipeline. By performing ONNX-to-TensorRT conversion ahead of time and using hardware-aware packaging, we eliminated the 10–60 second startup delays and GPU contention caused by real-time conversion. This shift reduces cold-start latency to near-zero and ensures consistent, reproducible performance across our diverse GPU fleet.

CPU/GPU Disaggregation Infrastructure

To address low GPU utilization, we introduced a disaggregated architecture that separates feature preprocessing (heavy CPU tasks) from model inference. Preprocessing tasks now run on dedicated CPU resources, while GPUs focus exclusively on high-speed model inference. This topology has delivered up to double digit throughput improvement per node with minimal added latency, providing the headroom necessary for heavier generative models.

Next Steps

While the transition to a transformer-centered generative architecture has unlocked significant performance gains, we’re only at the beginning of this journey. Our roadmap focuses on deepening our understanding of Uber Eats user behavior and expanding the scope of our recommendations:

Account life sequence learning. One of our primary focuses is extending the temporal horizon of our models. While our current system effectively captures immediate and recent Uber Eats user intent through medium-length sequences, someone’s relationship with Uber Eats is often defined by habits formed over months or years. We’re working to scale our generative models to process longer behavioral histories—moving toward sequence learning for the life of the account—to better distinguish between transient cravings and deeply rooted culinary preferences.
2-D whole page personalization. Most recommendation systems, including our current iteration, operate on a 1-D plane by calculating the relevance between a user and an individual merchant. However, the Uber Eats homefeed is inherently a 2-D space where stores are organized into context-rich carousels. Our future direction involves 2-D ranking, optimizing for the visual flow and diversity of the entire page layout simultaneously.
Integrating generative recommenders with real-world constraints. Scaling generative models for real-world businesses brings unique challenges. Unlike models in purely digital domains, our systems must respect physical realities like delivery radius and location constraints. Furthermore, we must balance competing objectives: Uber Eats users often have a strong intent to reorder favorites, yet the platform must also facilitate new discovery. Navigating this multi-objective landscape remains a key focus for our engineering teams.

Conclusion

The modernization of the Uber Eats homefeed represents a fundamental evolution from a static, statistics-driven experience to a dynamic, intent-aware discovery engine. By shifting from traditional pointwise models to a listwise GenRec paradigm, integrating near-real-time features and deep behavioral sequences, we’ve significantly enhanced the system’s ability to understand the complex and subtle relationship between users and the vast array of available merchants. These ML leaps have transformed the Uber Eats home feed experience, delivering recommendations that stay as fresh and relevant as the food and groceries arriving at our customers’ doors.

Acknowledgments

This project was a cross-functional collaboration between multiple teams, whose collective efforts were instrumental in bringing these architectural improvements to life:

Feed Intelligence
Next Platform
Uber AI Michelangelo Platform
Feed Science and Product Management

Cover Photo Attribution: “UBER Eats Delivery Cyclist” by shopblocks is licensed under CC BY 2.0.

Apache®, Apache Spark™, and the star logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.

PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.

TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.

Next-Gen Restaurant Recommendation with Generative Modeling and Real-Time Features

Introduction

From Statistics-Based Features to User Behavioral Sequence Features

The Hybrid Approach

Target-Aware Sequence Modeling

Integration and Scaling

From Batch-Processed to Near-Real-Time Features

From a Discriminative Model to a Generative Model

Moving to a Transformer-Centered Architecture

From Pointwise to Listwise Parallelism

ML System Optimization

Training Efficiency

Reliable Serving

CPU/GPU Disaggregation Infrastructure

Next Steps

Conclusion

Acknowledgments

Company

Products

Global citizenship

Travel

Select your preferred language

Products

Company

Select your preferred language

Ride

Drive & deliver

Uber Eats

Business

Drive & deliver

Ride

Uber Eats

Uber for Business

Manage account

Sign out