Skip to main content
Data / ML

Reducing Logging Cost by Two Orders of Magnitude using CLP

29 September 2022 / Global
Featured image for Reducing Logging Cost by Two Orders of Magnitude using CLP
Image
Image
Figure 1: The logging architecture of our Spark cluster. While a job is in progress, its logs are written to the container host’s SSD. When a job completes, the YARN Node Manager uploads the logs to HDFS.
Image
Figure 2: How a log message is compressed using CLP. 
Image
Figure 3: Splitting CLP compression deployment into two phases in our deployment. 
Image
Figure 4: IR format of the example log message. <H> represents the header. 
Image
Figure 5: Representation of 0.335 using our custom float encoding 
Image
Figure 6: IEEE-754 single-precision (32 bits) representation of 0.335
Jack (Yu) Luo

Jack (Yu) Luo

Jack Luo is an Uber engineer who works alongside Uber’s Spark, YARN, HDFS, and observability teams to improve logging, observability, and analytics infrastructure. He is also one of the authors and core developers of the CLP system during his PhD research at University of Toronto.

Devesh Agrawal

Devesh Agrawal

Devesh Agarwal formerly managed the Presto and Spark teams at Uber. He authored Presto Pinot connector, and lead development of the ultra low latency fork of Presto used by many online services at Uber.

Posted by Jack (Yu) Luo, Devesh Agrawal

Category: