Real-World Lossless Compression
Details
Presented by Sam Hughes
Abstract:
What actually happens when you run gzip? Or:
It is the year 2024. You're out minding your own business, when a highwayman brandishes a large floppy disk and demands, "How would you design a file compression format? You have 30 seconds to answer!"
This talk will prepare you for that moment.
We'll cover the basic stuff that goes into file compression formats, such as gzip and zstd, so that you understand what they actually do. Topics include Lempel-Ziv algorithms, Huffman and arithmetic coding, dictionary management, and preprocessing input data, with digressions about character encodings, serializing integers, and practical file format design.
What's out of scope? Image, audio, and lossy compression.
After attending, you should be able to make a half-decent compression format.
Papers (not required reading):
- RFC-1951: deflate (https://tools.ietf.org/html/rfc1951)
- RFC-1952: gzip (https://tools.ietf.org/html/rfc1952)
- RFC-8478: zstd (https://tools.ietf.org/html/rfc8478)
