Skip to main content
Uber logo

Schedule rides in advance

Reserve a rideReserve a ride

Schedule rides in advance

Reserve a rideReserve a ride
Engineering, Data / ML

Spark Analysers: Catching Anti-Patterns In Spark Apps

1 June 2023 / Global
Featured image for Spark Analysers: Catching Anti-Patterns In Spark Apps
Image
Figure 1: Architecture
Image
Figure 2: Spark Events
Image
Figure 3: Spark Plan
Image
Figure 4: Scan Recommendations
Image
Figure 5: Excessive Partition Scan Analyser
Image
Figure 6: Spark DAG
Image
Figure 7: Spark Transformations & Actions
Image
Figure 8: Duplicate Spark Plan Analyser
Image
Figure 9: Spark Plan Example
Image
Image
Figure 10: YARRed Architecture
Image
Figure 11: Sample Jira Ticket
Vijayant Soni

Vijayant Soni

Vijayant Soni is a Software Engineer on Uber's Delivery Data Solutions Team. He has worked on enhancing Uber's ETL frameworks to avoid pipeline duplication for different environments and to perform small file compaction with a single feature flag. He ideated and developed Spark Analysers to uncover the most common issues users face when writing a Spark application. He is currently working on decentralizing a huge hive database (~10 Petabytes) to achieve better scalability and sustain significant data growth at Uber.

Sashidhar Thallam

Sashidhar Thallam

Sashidhar Thallam is a former Staff Software Engineer on Uber’s Delivery Data Solutions team. He was working on automations to optimize resource usage for all the Hive workloads. He built Query-Analysers which detects a number of antipatterns in Hive queries and suggests improvements to the query owners.

Sakshi Pande

Sakshi Pande

Sakshi Pande is a Software Engineer on the Data Chargebacks and Consumption Reduction team. She is one of the early engineers involved with the Chargeback and Cost Efficiency initiative since Nov '21, playing a crucial role in initiatives like HDFSRed, YARNRed, PrestoRed.

Atul Mantri

Atul Mantri

Atul Mantri is a Senior Software Engineer on Uber's Data Platform team. He is focused on building systems that enable big data observability across all batch and real-time applications at Uber and turbocharging the cost-efficiency initiatives in the platform. Before Uber, Atul worked at Rubrik and Netapp building high-performance distributed systems. He holds a Masters degree from NC State University.

Posted by Vijayant Soni, Sashidhar Thallam, Sakshi Pande, Atul Mantri