Skip to main content
Data / ML, Engineering

Containerizing Apache Hadoop Infrastructure at Uber

22 July 2021 / Global
Featured image for Containerizing Apache Hadoop Infrastructure at Uber
Figure 1: Team Responsibilities Shift
Figure 2: Cluster Management Architecture
Figure 3: Automatic Detection & Decommission of Bad HDFS DataNodes
Figure 4: YARN NodeManager and Application Sibling Containers
Figure 5: Kerberos Principal Registration & Keytab Distribution
Figure 6: UserGroups within Containers
Figure 7: Starlark file defining Configurations for different Cluster Types
Figure 8: Client Configuration Management
Figure 9: Migrating 200+ hosts within ~7 days
Matt Mathew

Matt Mathew

Matt is a Sr. Staff Engineer on the Engineering Security team at Uber. He currently works on various projects in the security domain. Previously, he led the initiative to containerize and automate Data infrastructure at Uber.

Qifan Shi

Qifan Shi

Qifan is a Senior Software Engineer with the Data Infrastructure team at Uber, and a core contributor for Hadoop containerization. He has been working on multiple systems that effectively orchestrates large-scale HDFS clusters.

Shuyi Zhang

Shuyi Zhang

Shuyi Zhang is an Engineering Manager at Uber leading OpenSearch adoption and development at Uber and innovations in open source. She’s also a member of the Observability technical advisory group under the OpenSearch project.

Posted by Matt Mathew, Qifan Shi, Shuyi Zhang, Jackie Murchison