What you’ll do
- Deliver a completely self-service parallel compute framework based on Apache Spark for a variety of near real-time and big data applications running on YARN and Mesos
- Provide interactive SQL access to 10s of PB of data with a few seconds of latency with Presto
- Provide Hive as a highly reliable and available service for Uber’s bulk data processing needs; provide Uber specific optimizations and features such as geo-spatial-temporal support
- Build a highly scalable, reliable and efficient data storage system based on HDFS for Uber’s data lake
- Interactive workbench to boost productivity of Uber’s Data Scientists
- Data Security with Authentication, Authorization, and Auditing mechanisms
What you’ll need
- We are looking for curious, self-motivated engineers with strong coding, testing, debugging and design skills
- Solid understanding of distributed systems and system fundamentals such as concurrency, multi-threading, locking etc.
- Past data infra experience or knowledge about Hadoop eco-systems is not necessary (we will mentor you!) but if you do have past data infra experience, we really want to chat with you :-)
- Be customer obsessed and have ability to translate customer and technical requirements into detailed architecture and design
- Bonus points if
- Experience with large scale data analytics, query optimization and execution, highly available/fault tolerant systems, replicated data storage, and operating complex services running in the on-prem or cloud are all pluses
- Under the hood experience with some of the big data analytics technologies we currently use such as Apache Hadoop (HDFS and YARN), Hive, Spark, Docker/Mesos, and Tez. Presto is a plus. Under the hood experience with similar systems such as Vertica, Apache Impala, Drill, Google Borg, Google BigQuery, Amazon RedShift, Kubernetes, Mesos etc. is also a plus.
About the Team
The Hadoop Analytics and Infrastructure team is responsible for providing all data storage and batch processing needs to the rest of the company. We have a small tightly knit team with a diverse set of backgrounds such as Facebook, Google, Cloudera, Hortonworks, Amazon, LinkedIn, Twitter, Pinterest, Dropbox, other startups and recent college grads. Areas listed above are technically deep areas that are undergoing massive innovation in the community. Uber, as a business, is also growing rapidly, and Data is at the heart of many products e.g. Pricing predictions, route determination, ETAs, fraud detection, storage and processing of Autonomous Vehicle logs etc.
By solving these business problems you will not only be helping Uber but also have a front row seat to build and innovate the future Big Data systems and contribute them back to open source. This is an exciting time to be a Data Infrastructure engineer at Uber. Be sure to checkout our engineering blog to learn more about the team.