Skip to main content
Data / ML

DataK9: Auto-categorizing an exabyte of data at field level through AI/ML

May 9 / Global
Featured image for DataK9: Auto-categorizing an exabyte of data at field level through AI/ML
Image
Figure 1: Column name in categorization.
Image
Figure 2: Generic column name.
Image
Figure 3: Address column name.
Image
Figure 4: Categorization strategy.
Image
Figure 5: Data K9 architecture.
Image
Figure 6: Location latitude categorization configuration.
Image
Figure 7: Learning-based AI.
Image
Figure 8: Accuracy confusion matrix.
Image
Figure 9: Accuracy metrics
Image
Figure 10: Dataset categorization funnel.
Lei Sun

Lei Sun

Lei Sun is a Tech Lead Manager on the Technical Privacy team at Uber. His team builds highly scalable, highly reliable yet efficient infrastructure with innovative ideas to protect the privacy of customers and empower Uber engineers to seamlessly integrate security, compliance and privacy into their product development lifecycle.

Mohammad Islam

Mohammad Islam

Mohammad Islam is a Distinguished Engineer at Uber. He currently works within the Engineering Security organization to enhance the company's security, privacy, and compliance measures. Before his current role, he co-founded Uber’s big data platform. Mohammad is the author of an O'Reilly book on Apache Oozie and serves as a Project Management Committee (PMC) member for Apache Oozie and Tez.

Posted by Lei Sun, Mohammad Islam

Category: