Staff ML Engineer
Senior Staff ML Engineer
Senior Machine Learning Engineer
Introduction
The homefeed is the primary gateway for millions of Uber Eats users worldwide, serving as the central hub where hunger meets discovery. For people using Uber Eats, a well-optimized feed reduces cognitive load and helps them find their next meal with ease. For merchants, it’s a critical platform for visibility and growth. As the scale of our offerings grows—expanding from local favorites to grocery, alcohol, and retail—the homefeed’s role as a personalized discovery engine becomes even more vital to the overall experience.
Powering this experience is our recommendation model, an intelligent layer that synthesizes billions of signals—from real-time behavioral cues to geographic context—to rank the optimal options for every session. To keep pace with this increasing complexity and scale, we’ve fundamentally overhauled our modeling architecture. This blog explores how we modernized the Uber Eats feed recommendation system using behavioral sequences, transformer architectures, and near-real-time features, while establishing a roadmap toward generative recommendation models.
From Statistics-Based Features to User Behavioral Sequence Features
For years, our homefeed DeepCVR model relied primarily on aggregate statistics and hand-crafted features to predict Uber Eats user affinity for a given merchant. While effective at a baseline level, these static features often failed to capture the nuanced, chronological context of an Uber Eats user’s journey. To overcome this, we’ve evolved the architecture into a hybrid model that combines the strengths of traditional DLRM (Deep Learning Recommendation Models) with modern sequence modeling.
The Hybrid Approach
The core of our new architecture is a dual-path system. One path maintains the DCNv2-based model to process traditional scalar and categorical features, while the second path introduces a transformer-based sequence encoder.
- DLRM/DCNv2 path. This path continues to handle the high-dimensional sparse features and dense statistics that represent the steady-state preferences of Uber Eats users and the characteristics of merchants.
- Sequence path. We ingest a chronological log of Uber Eats user actions—including clicks and orders—and process them through multi-head self-attention layers. This allows the model to capture fine-grained temporal dependencies and better understand an Uber Eats user’s evolving intent over time.
Target-Aware Sequence Modeling
A key aspect in this hybrid setup is target-aware training. Instead of encoding the Uber Eats user sequence in isolation, we append the target store (the merchant we’re currently scoring) to the sequence. This allows the transformer to compute the direct relationship between past behavior and the specific candidate merchant, a technique inspired by industry benchmarks like DIN and BST.
Integration and Scaling
The output of the transformer is a dense sequence embedding that represents the Uber Eats user’s historical context in relation to the target. We then extract this embedding and append it to the feature vector of the main DeepCVR model. This hybrid design not only improves recommendation relevance but also serves as the foundation for our goal of scaling the model size.
From Batch-Processed to Near-Real-Time Features
Previously, our recommendation models relied on batch features computed by offline jobs, resulting in a 24-hour or longer lag in feature computation. This meant our models were often unaware of an Uber Eats user’s most recent actions.
To bridge this freshness gap, we built on the Next Personalization Platform—Uber’s internal platform for powering personalized experiences across our apps. At the core of this platform is UserContext, a near real-time, cross-line-of-business, event-sourced history of user actions. This architecture allows us to compute features on the fly from an Uber Eats user’s sequence of past actions, rather than relying on static, pre-materialized aggregate data.
The features are computed using FeatureExtractors, which are pure Java functions invoked by the online Feature Store service. For training data generation, we use an Apache Spark™ job to reconstruct the UserContext at past inference timestamps and invoke the same FeatureExtractors to generate the required features. This guarantees that the features used for training are identical to those computed during live inference. This addresses the critical engineering challenge of ensuring online-offline parity to prevent training-serving skew. We employ continuous monitoring via sampled feature logging, comparing live outputs against offline re-computations to ensure our feature consistency.
We have now reduced the data lag from days to a few seconds, enabling the model to incorporate an Uber Eats user’s most recent interactions within the same session. This shift has proven particularly transformative for our most challenging user segments, such as cold-start users with little to no historical data on the platform.
From a Discriminative Model to a Generative Model
The introduction of sequence features and the hybrid architecture represented a major leap in our ability to capture Uber Eats user interest. However, this hybrid approach still functioned within a traditional discriminative, pointwise framework. That is, each inference call required a forward pass for a single Uber-Eats-user:store pair, and the model remained a dual-path system where traditional DLRM components and transformer encoders operated in parallel before their outputs were merged.
To truly scale our recommendation capabilities, we’re shifting from this hybrid design to a transformer-centered architecture, paving the way for a fully GenRec (Generative Recommender) paradigm.
Moving to a Transformer-Centered Architecture
While the hybrid model successfully integrated user behavioral sequences, it still relied on a DCNv2-based path to handle high-dimensional sparse and dense statistics. In our new architecture, we move away from this parallel structure. Instead, the transformer encoder becomes the primary trunk of the model. Traditional non-sequence features are no longer processed in a separate DLRM path; instead, they are transformed and concatenated to target token features before entering the transformer, serving as enriched feature representation for the sequence module.
From Pointwise to Listwise Parallelism
A fundamental limitation of the previous model was its pointwise nature—where it’d only predict a score for one target store at a time. Our modernized architecture evolves this into a listwise approach where the model takes an array of candidate stores in the same session as an input. This allows the model to generate scores for an entire list of merchants in a single forward pass, significantly improving training and serving efficiency by reducing the complexity per store to roughly 1/T of the original model (where T is the number of target stores).
ML System Optimization
To harness the benefits of generative modeling and near-real-time features without compromising system stability, we also evolved our ML ecosystem. By refining our training and serving infrastructure, we’ve built a robust foundation for more sophisticated recommendation architectures.
Training Efficiency
We migrated from Keras/TensorFlow™ to PyTorch™ v2 to gain the flexibility needed for advanced sequence modeling architectures. To further accelerate training, we implemented multi-hash embeddings (which resulted in a material throughput improvement) and BF16 mixed-precision training (which resulted in an additional notable speedup). These optimizations allow us to train larger models while maintaining training times comparable to our previous baseline.
Reliable Serving
We transitioned from on-the-fly model conversion to a modular offline optimization pipeline. By performing ONNX-to-TensorRT conversion ahead of time and using hardware-aware packaging, we eliminated the 10–60 second startup delays and GPU contention caused by real-time conversion. This shift reduces cold-start latency to near-zero and ensures consistent, reproducible performance across our diverse GPU fleet.
CPU/GPU Disaggregation Infrastructure
To address low GPU utilization, we introduced a disaggregated architecture that separates feature preprocessing (heavy CPU tasks) from model inference. Preprocessing tasks now run on dedicated CPU resources, while GPUs focus exclusively on high-speed model inference. This topology has delivered up to double digit throughput improvement per node with minimal added latency, providing the headroom necessary for heavier generative models.
Next Steps
While the transition to a transformer-centered generative architecture has unlocked significant performance gains, we’re only at the beginning of this journey. Our roadmap focuses on deepening our understanding of Uber Eats user behavior and expanding the scope of our recommendations:
- Account life sequence learning. One of our primary focuses is extending the temporal horizon of our models. While our current system effectively captures immediate and recent Uber Eats user intent through medium-length sequences, someone’s relationship with Uber Eats is often defined by habits formed over months or years. We’re working to scale our generative models to process longer behavioral histories—moving toward sequence learning for the life of the account—to better distinguish between transient cravings and deeply rooted culinary preferences.
- 2-D whole page personalization. Most recommendation systems, including our current iteration, operate on a 1-D plane by calculating the relevance between a user and an individual merchant. However, the Uber Eats homefeed is inherently a 2-D space where stores are organized into context-rich carousels. Our future direction involves 2-D ranking, optimizing for the visual flow and diversity of the entire page layout simultaneously.
- Integrating generative recommenders with real-world constraints. Scaling generative models for real-world businesses brings unique challenges. Unlike models in purely digital domains, our systems must respect physical realities like delivery radius and location constraints. Furthermore, we must balance competing objectives: Uber Eats users often have a strong intent to reorder favorites, yet the platform must also facilitate new discovery. Navigating this multi-objective landscape remains a key focus for our engineering teams.
Conclusion
The modernization of the Uber Eats homefeed represents a fundamental evolution from a static, statistics-driven experience to a dynamic, intent-aware discovery engine. By shifting from traditional pointwise models to a listwise GenRec paradigm, integrating near-real-time features and deep behavioral sequences, we’ve significantly enhanced the system’s ability to understand the complex and subtle relationship between users and the vast array of available merchants. These ML leaps have transformed the Uber Eats home feed experience, delivering recommendations that stay as fresh and relevant as the food and groceries arriving at our customers’ doors.
Acknowledgments
This project was a cross-functional collaboration between multiple teams, whose collective efforts were instrumental in bringing these architectural improvements to life:
- Feed Intelligence
- Next Platform
- Uber AI Michelangelo Platform
- Feed Science and Product Management
Cover Photo Attribution: “UBER Eats Delivery Cyclist” by shopblocks is licensed under CC BY 2.0.
Apache®, Apache Spark™, and the star logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
PyTorch, the PyTorch logo and any related marks are trademarks of The Linux Foundation.
TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.
Yicheng Chen
Staff ML Engineer
Builds feed recommendation models for Uber Eats Feed Intelligence.
Peng Chen
Senior Staff ML Engineer
Builds feed recommendation models for Uber Eats Feed Intelligence.
Nikat Patel
Senior Machine Learning Engineer
Builds feed recommendation models for Uber Eats Feed Intelligence.
Sanjeev Suresh
Senior Machine Learning Engineer
Builds real-time ML features for Uber AI’s Next Personalization team.
Bo Ling
Senior Staff Engineer
Works on large-scale AI models with Uber’s Michelangelo platform team.
Products
Company