Skip to main content
Backend, Data / ML, Engineering, Uber AI

I/O Observability for Uber’s Massive Petabyte-Scale Data Lake

13 November / Global
Featured image for I/O Observability for Uber’s Massive Petabyte-Scale Data Lake
Image
Figure 1: Network and file system observability at Uber.
Image
Figure 2: HiCam setup to manage high scale and cardinality.
Image
Table 1: Performance/storage reduction with the HiCam solution.
Image
Table 2: Dimensions captured for incident mitigation and attribution.
Image
Figure 3: Real-time application-level insights on DataCentral.
Image
Figure 4: Top applications/paths for incident mitigation and insights.
Image
Figure 5: On-prem to GCS egress and ingress.
Arnav Balyan

Arnav Balyan

Arnav Balyan is a Senior Software Engineer on Uber’s Data team. He’s a committer to Apache Gluten™, and works on optimizing query engines and distributed systems at scale.

Kartik Bommepally

Kartik Bommepally

Kartik Bommepally is a Senior Staff Engineer on Uber’s CloudLake team. He works on planning and strategizing the hybrid datalake ecosystem for Uber.

Amruth Sampath

Amruth Sampath

Amruth Sampath is a Senior Engineering Manager on Uber’s Data Platform team. He leads the Batch Data Infra org comprising Spark, Storage, Data Lifecycle management, Replication, and Cloud Migration.

Jing Zhao

Jing Zhao

Jing Zhao is a Principal Engineer on the Data team at Uber. He is a committer and PMC member of Apache Hadoop and Apache Ratis.

Akshayaprakash Sharma

Akshayaprakash Sharma

Akshayaprakash Sharma is a Staff Software Engineer at Uber, currently working on the Data Observability Team. Akshaya has previously worked on Hive, Spark, Vertica and Data Reporting Tools.

Posted by Arnav Balyan, Kartik Bommepally, Amruth Sampath, Jing Zhao, Akshayaprakash Sharma