Skip to main content
Engineering, Backend, Data / ML, Uber AI

Uber’s Strategy to Upgrading 2M+ Spark Jobs

25 September / Global
Featured image for Uber’s Strategy to Upgrading 2M+ Spark Jobs
Image
Image
Figure 1: Data landscape at Uber.
Image
Figure 2: Spark 2 to 3 migration process at Uber.
Image
Figure 3: Path translation with Iron Dome.
Image
Figure 4 depicts how the migration process happened for each Spark application.
Image
Figure 5: Migration process.
Image
Figure 6: Observed performance gains.
Amruth Sampath

Amruth Sampath

Amruth Sampath is a Senior Engineering Manager on Uber’s Data Platform team. He leads the Batch Infra Org comprising Spark, Storage (HDFS, GCS), Replication, Uber’s Data Onprem to Cloud mega transition.

Arnav Balyan

Arnav Balyan

Arnav Balyan is a Senior Software Engineer on Uber’s Data team. He’s a committer to Apache Gluten™, and works on optimizing query engines and distributed systems at scale.

Nimesh Khandelwal

Nimesh Khandelwal

Nimesh Khandelwal is a Senior Software Engineer on the Spark team. He is focused on projects on modernizing and optimizing the Spark ecosystem at Uber.

Sumit Singh

Sumit Singh

Sumit Singh is a Senior Software Engineer on the Spark team. His primary focus area is query planning and io optimizations.

Parth Halani

Parth Halani

Parth Halani is a Software Engineer on the Spark team. His primary focus area is running Spark efficiently.

Suprit Acharya

Suprit Acharya

Suprit Acharya is a Senior Manager on Uber’s Data Platform team, leading Batch Data Compute Engine (Spark, Hive), Observability, Efficiency, and Data Science Platforms.

Posted by Amruth Sampath, Arnav Balyan, Nimesh Khandelwal, Sumit Singh, Parth Halani, Suprit Acharya