Skip to main content
Engineering, Data / ML

Apache Hudi™ at Uber: Engineering for Trillion-Record-Scale Data Lake Operations

January 16 / Global
Featured image for Apache Hudi™ at Uber: Engineering for Trillion-Record-Scale Data Lake Operations
Image
Figure 1: Uber’s data lake architecture with Hudi at the core.
Image
Figure 2: Hudi dataset cross-region replication. 
Prashant Wason

Prashant Wason

Prashant Wason is a Staff Software Engineer on Uber’s Batch Data Platform, where he works on building and scaling large-scale data lake and lakehouse systems, with a focus on table formats, data reliability, and performance at massive concurrency. He’s a committer and PMC member of the Apache Hudi project, and recently co-authored the book Apache Hudi: The Definitive Guide.

Balajee Nagasubramaniam

Balajee Nagasubramaniam

Balajee Nagasubramaniam is a Staff Software Engineer on Uber’s Batch Data Platform, where he works on building and scaling large-scale data lake and lakehouse systems. His focus areas include table formats, data replication, data quality and reliability, and optimizing large-scale production workloads to improve performance and reduce operational overheads.

Surya Prasanna Kumar Yalla

Surya Prasanna Kumar Yalla

Surya Prasanna Kumar Yalla is a Senior Software Engineer on Uber’s Batch Data Platform, where he works on building and scaling large-scale data lake and lakehouse systems, with a focus on table formats, data ingestion, query performance, and optimization at massive scale.

Meenal Binwade

Meenal Binwade

Meenal Binwade is an Engineer Manager on Uber’s Batch Data Platform, where she leads a team working on table formats and services. The team’s focus is on building highly scalable data lake and lakehouse systems.

Xinli Shang

Xinli Shang

Xinli Shang is the ex–Apache Parquet™ PMC Chair, a Presto® committer, and a member of Uber’s Open Source Committee. He leads several initiatives advancing data format innovation for storage efficiency, security, and performance. Xinli is passionate about open-source collaboration, scalable data infrastructure, and bridging the gap between research and real-world data platform engineering.

Jack Song

Jack Song

Jack Song is an engineering leader specializing in large-scale Data and AI platforms. At Uber, he leads the Data Platform organization, building multi-cloud infrastructure, multi-modal data systems, and the agentic automation layer that powers Uber’s next-generation Data AI Agents.

Posted by Prashant Wason, Balajee Nagasubramaniam, Surya Prasanna Kumar Yalla, Meenal Binwade, Xinli Shang, Jack Song