Reducing Logging Cost by Two Orders of Magnitude using CLP
Long, long ago, the amount of data our systems output to logs was small enough that we were able to retain all of the log files. This allowed our engineers to freely analyze the logs, say for troubleshooting our systems or improving applications. But as Uber's business grew rapidly, the amount of da...
Hasnain says:
Now this is some really impressive work, taking costs from $1.8M/yr to $10k/yr for log storage. I liked how it was an iterative process, massaging and moving around data till it can be compressed much better. Reminds me of some work we did back in the day to split up data a little for better compression. The wins are huge!
“We have deployed Phase 1 (i.e., the custom Log4j appender with our custom float encoding) across our entire Spark platform. We are currently working on deploying the Phase 2 compression and integrating CLP’s search capability into our analytics and observability platforms.
Result of Phase 1 compression: In a 30-day window, our entire Spark ecosystem generated 5.38PB of uncompressed INFO level unstructured logs yet our CLP appender compressed them to only 31.4TB, amounting to an unprecedented 169x compression ratio. Now with CLP, we have restored our log verbosity from WARN back to INFO, and we can afford to retain all the logs for 1 month (as requested by our engineers).
Preliminary result of Phase 2 compression: The above mentioned result is only the size of the compressed IR. We have tested a prototype of CLP’s complete compression (including both Phase 1 and 2) on a subset of our Spark logs, and CLP’s compression ratio is 2.16x higher than Zstandard’s ratio and 2.28x higher than Gzip’s ratio. This is consistent with the results reported on other log datasets. “
Posted on 2022-10-01T16:19:03+0000