Skip to main content
Backend, Data / ML, Engineering

Improving HDFS I/O Utilization for Efficiency

October 13, 2021 / Global
Featured image for Improving HDFS I/O Utilization for Efficiency
Figure 1: The new CAP theorem
Figure 2: IO utilization among all drives in HDFS
Figure 3: IO utilization among the busiest drives in HDFS
Figure 4: Busiest disks distributions in HDFS
Figure 5: Busy hosts distributions in HDFS clusters
Figure 6: Disk IO utilization: HDFS only vs. HDFS + Yarn co-location
Figure 7: Aggregated Disk IO utilization: HDFS only vs. HDFS + Yarn co-location
Kevin Cheng

Kevin Cheng

Kevin Cheng is a Senior Software Engineer II on Uber’s Core Data team. He is managing Data’s infrastructure reliability, efficiency, and performance. Before Uber, he worked on system design for over 20 years.

Jeffrey Zhong

Jeffrey Zhong

Jeffrey Zhong is a former Engineering manager on Uber’s Core Data team, and managed data lake storage (Hadoop HDFS & Apache Hudi), batch compute scheduling (Hadoop YARN), and Data’s hardware and performance. Before Uber, he worked on Apache HBase and Apache Phoenix.

Posted by Kevin Cheng, Jeffrey Zhong