Skip to main content
Engineering, Data / ML

Spark Analysers: Catching Anti-Patterns In Spark Apps

June 1, 2023 / Global
Featured image for Spark Analysers: Catching Anti-Patterns In Spark Apps
Image
Figure 1: Architecture
Image
Figure 2: Spark Events
Image
Figure 3: Spark Plan
Image
Figure 4: Scan Recommendations
Image
Figure 5: Excessive Partition Scan Analyser
Image
Figure 6: Spark DAG
Image
Figure 7: Spark Transformations & Actions
Image
Figure 8: Duplicate Spark Plan Analyser
Image
Figure 9: Spark Plan Example
Image
Image
Figure 10: YARRed Architecture
Image
Figure 11: Sample Jira Ticket
Vijayant Soni

Vijayant Soni

Vijayant is a Senior Software Engineer on Uber’s Delivery Data Solutions team. He recently worked on building a real-time system to enable financial reporting for Uber Eats merchants, significantly improving data quality, reducing freshness SLAs, and reducing serving latencies from hours to within a few minutes for high-volume merchants. He’s currently working on building a platform to detect gaps in clickstream data.

Sashidhar Thallam

Sashidhar Thallam

Sashidhar Thallam is a former Staff Software Engineer on Uber’s Delivery Data Solutions team. He was working on automations to optimize resource usage for all the Hive workloads. He built Query-Analysers which detects a number of antipatterns in Hive queries and suggests improvements to the query owners.

Sakshi Pande

Sakshi Pande

Sakshi Pande is a Software Engineer on the Data Chargebacks and Consumption Reduction team. She is one of the early engineers involved with the Chargeback and Cost Efficiency initiative since Nov '21, playing a crucial role in initiatives like HDFSRed, YARNRed, PrestoRed.

Atul Mantri

Atul Mantri

Atul Mantri is a Senior Software Engineer on Uber's Data Platform team. He is focused on building systems that enable big data observability across all batch and real-time applications at Uber and turbocharging the cost-efficiency initiatives in the platform. Before Uber, Atul worked at Rubrik and Netapp building high-performance distributed systems. He holds a Masters degree from NC State University.

Posted by Vijayant Soni, Sashidhar Thallam, Sakshi Pande, Atul Mantri