Skip to main content
Engineering

Uber Case Study: Choosing the Right HDFS File Format for Your Apache Spark Jobs

21 March 2019 / Global
Featured image for Uber Case Study: Choosing the Right  HDFS File Format for Your Apache Spark Jobs
Figure 1. We ingest the imagery and imagery metadata into Uber data centers and then use Apache Spark to process the imagery and metadata.
ResourceAvroParquetImprovement
Wall Time (sec)20.767.17290%
Core Time (min)24.801.281,938%
Reads (MB) 24,678.4 1,848.51,335%
ResourceAvroParquetImprovement
Wall Time (sec)18.486.0308%
Core Time (min)1670.0050.763,289%
Reads (MB) 24,678.4376.66,552%
Scott Short

Scott Short

Scott Short is a senior software engineer on Uber's Maps Engineering team, based in Boulder, CO.

Posted by Scott Short

Category: