The Global Data Warehouse Team (GDW) powers analytics for many of Uber’s businesses. Want to know how many users joined Uber as Riders and subsequently decided to become Drivers on our platform? The Global Data Warehouse team maintains the data objects which answer this question. Need to analyze how the wait times shown in the Rider app correlate with Rider and Driver ratings? We have the data at the ready. We model tables and build data pipelines for the core of our business including Driver, Rider and Trip analytics. We collaborate with teams including Eats, Fraud, Ops, Finance, and Marketing to support domain specific needs. We ingest truly massive volumes of data generated from our globally distributed users and structure this data in an analytics-friendly way while guaranteeing highest fidelity of historical data and low latency - questions at Uber don’t wait for an answer for a very long time.
As a Senior Software Engineer in Data 1 at Uber you will play a leading role in scaling the global data warehouse to power analytics for teams across Uber. You are a self-starter with extensive industrial experience in SQL, Data Modeling, and ETL pipeline design. You have deep experience implementing ETL pipelines in Hive or another MPP database architecture. You are comfortable with Spark and Presto having used one or both frequently to process very large volumes of data. You possess at least a working knowledge on a platform for streaming analytics. You are comfortable coding in Python, Java, or Scala. You have demonstrated strong competency in reliably operating 100s of ETL pipelines with adherence to strict SLAs and quickly root-causing and correcting complex data problems. Peers describe you as the go-to person for the most challenging data ingestion and modeling problems. You actively mentor junior team members and attract others inside and outside your company to join your team. Detail-orientation, thoroughly tested code, and great documentation are the hallmarks of your work but you excel equally well at explaining concepts in “big picture” terms to a less technical audience. If this describes you and you tick off the boxes below, we would love to hear from you.
- 5+ years expertise creating and evolving dimensional data models & schema designs to structure data for business-relevant analytics.
- 5+ years hands-on experience using SQL to build and deploy production-quality ETL pipelines.
- 3+ years experience ingesting and transforming structured and unstructured data from internal and third party sources into dimensional models.
- 3+ years experience writing and deploying Python, Scala, or Java code.
- 3+ years hands-on experience using Hadoop, Hive, Vertica or another MPP database system like AWS Redshift or Teradata.
- 2+ years experience building and operating realtime streaming data pipelines using Spark Streaming, or Flink
- Track record of successful partnerships with product and engineering teams resulting in on-time delivery of impactful data products.
- Demonstrated ability to think strategically about business, product, and technical challenges and implement data solutions which scale to meet future needs.
- Experience developing scripts and tools to enable faster data consumption.
- In-depth understanding of with Kimball’s data warehouse lifecycle.
- Extensive experience with real-time data ingestion and stream processing.
- Demonstrated familiarity with industry-leading Big Data ETL practices.