Skip to main content
Engineering, Backend

Evolution of Data Lifecycle Management at Uber

August 17, 2023 / Global
Featured image for Evolution of Data Lifecycle Management at Uber
Image
Figure 1: Various Data Lifecycle Operations
Image
Figure 2: Goals of DLM
Key GoalBenefits
Compliance Requirements (e.g., GDPR, SOX, HIPAA)Improve the organization’s compliance and risk posture

Enable the organization to operate globally/locally with minimum disruptions to the business, and avoid penalties from a regulatory and compliance standpoint (penalties amounting to several million USD)
Cost Efficiency (e.g., timely deletion of old, unused data, migrating data to the appropriate storage tier based on access patterns)Deletion of old unused data of several PB each month, leading to storage cost savings of several million USD per year

Storage of data in appropriate storage classes (hot, warm, cold) in order to optimize on storage cost, thereby saving several million USD per year
Data ReliabilityEnsure backups for data recovery in unexpected scenarios including data corruption and accidental data deletion
Image
Figure 3: Unified DLM System
PolicyGoal
Policy based retention / deletion
User-hold retention
User-data deletion
Time-based deletion (for PII and sensitive data)
#Compliance
Policy based encryption#Compliance
TTL deletion
Partition TTL
Table TTL
#CostEfficiency
Tiering
Hot → Warm, Warm → Cold
Archival
#CostEfficiency
Backup & Restore
(point-in-time snapshots)
#Reliability
Image
Figure 4: DLM Service Architecture at Uber
CharacteristicDetails
SourcesThe DLM Service relies on two inputs to the system: 

The Metadata Source System to provide the dataset metadata

The Policy Store as the source of all relevant Data Lifecycle Policies
Policy DeterminationFor each dataset, the DLM Service determines the relevant applicable policies (after deconfliction)
Dataset OnboardingFor each policy, the candidate datasets are identified and onboarded with an approval process, if required
Policy ActionThe DLM Service executes the policy action on the dataset using the appropriate DataStore Plugin
MonitoringThe DLM Service has an independent module to monitor at a policy level, and uses the policy-action-status history to monitor at the dataset level
AuditAll actions of the DLM Service are audited for investigation/audit purposes
Sumanth Srinivasa Krishnaswamy

Sumanth Srinivasa Krishnaswamy

Sumanth Srinivasa Krishnaswamy is a Staff Software Engineer at Uber, and was the Chief Architect of the Unified DLM Platform. He currently works on several products on the Adtech Team. Prior to Uber, he worked at Microsoft on a range of services in the Azure Storage Org. He takes pride in designing and delivering efficient, scalable, and extensible platform systems.

Mithun (Matt) Mathew

Mithun (Matt) Mathew

Mithun (Matt) Mathew is a Sr. Staff Engineer on the Data team at Uber. He currently works on various projects in the security domain. Previously, he led the initiative to containerize and automate Data infrastructure at Uber.

Sonali Goyal

Sonali Goyal

Sonali Goyal is a Senior Software Engineer on the Uber Data Platform Team. She is the core engineer leading the effort of scaling the DLM platform to Uber's scale across multiple data assets. In the past she has contributed to multiple other projects/initiatives spanning various aspects of Data Infra at Uber.

Posted by Sumanth Srinivasa Krishnaswamy, Mithun (Matt) Mathew, Sonali Goyal