Skip to main content
Uber logo

Schedule rides in advance

Reserve a rideReserve a ride

Schedule rides in advance

Reserve a rideReserve a ride
Engineering, Data / ML

Setting Uber’s Transactional Data Lake in Motion with Incremental ETL Using Apache Hudi

March 16, 2023 / Global
Featured image for Setting Uber’s Transactional Data Lake in Motion with Incremental ETL Using Apache Hudi
Image
Figure 1
Image
Figure 2 
Image
Figure 3
Image
Figure 4
Image
Figure 5
Image
Figure 6
Image
Figure 7
Image
Figure 8
Image
Figure 9
Image
Figure 10
Image
Figure 11
Vinoth Govindarajan

Vinoth Govindarajan

Vinoth Govindarajan is a former Staff Software Engineer on the Global Data Warehouse team. As a data infrastructure engineer, he was working to lower the latency and bridge the gap between the online systems and the data warehouse by designing incremental ETL frameworks for derived datasets. Next to his work, he contributes to a variety of open-source projects such as Apache Hudi and dbt-spark.

Saketh Chintapalli

Saketh Chintapalli

Saketh Chintapalli is a Software Engineer on the Global Data Warehouse team. His work primarily lies in data platform engineering, specifically in building reliable tooling and infrastructure for efficient data processing and pipelines, varying from batch to real-time workflows.

Yogesh Saswade

Yogesh Saswade

Yogesh Saswade is a Software Engineer on Uber's Delivery Data Solutions Team. He is the SME for anything on menu datasets. He worked on optimizing the performance (SLA & Cost) of the high-volume batch workloads to achieve near real-time analytics using Apache Hudi and Lakehouse ETL framework. He drove the YARN queue segregation initiative to achieve a scalable and federated resource structure. He is currently working on the humongous catalog data standardization.

Aayush Bareja

Aayush Bareja

Aayush Bareja is a Software Engineer working on the Uber Eats Delivery Data Solutions Team. He excels in using the Big Data stack to efficiently obtain canonical data for various analytical workloads, including batch, incremental, and real-time processing using technologies such as HDFS, Spark, Hive, Apache Flink, and Piper.

Posted by Vinoth Govindarajan, Saketh Chintapalli, Yogesh Saswade, Aayush Bareja