Real-World Lossless Compression


Details
Presented by Sam Hughes
Abstract:
What actually happens when you run gzip? Or:
It is the year 2024. You're out minding your own business, when a highwayman brandishes a large floppy disk and demands, "How would you design a file compression format? You have 30 seconds to answer!"
This talk will prepare you for that moment.
We'll cover the basic stuff that goes into file compression formats, such as gzip and zstd, so that you understand what they actually do. Topics include Lempel-Ziv algorithms, Huffman and arithmetic coding, dictionary management, and preprocessing input data, with digressions about character encodings, serializing integers, and practical file format design.
What's out of scope? Image, audio, and lossy compression.
After attending, you should be able to make a half-decent compression format.
Papers (not required reading):
- RFC-1951: deflate (https://tools.ietf.org/html/rfc1951)
- RFC-1952: gzip (https://tools.ietf.org/html/rfc1952)
- RFC-8478: zstd (https://tools.ietf.org/html/rfc8478)
(This talk generally focuses on compression techniques with an eye towards what these formats use. The content and depth of this talk is not the entirety of these RFC's.)
Reading advice:
Don't read the RFCs! But if you do, read the deflate RFC, skim the gzip RFC to get the "general idea," and read the zstd RFC's table of contents.
You may also want to see this stackoverflow answer for a description of the relationships, with historical context, between zip, gzip, deflate, and zlib:
https://stackoverflow.com/questions/20762094/how-are-zlib-gzip-and-zip-related-what-do-they-have-in-common-and-how-are-they/20765054#20765054
-------------------------
Street parking on 6th, 7th & 8th Avenues north of B Street is usually easy at that hour. Meters nearby are free after 6. Read signage before you park on A street.
If you're interested in presenting a paper please fill out this form (https://docs.google.com/forms/d/e/1FAIpQLScaI-fWdys27-ByT_HdtsJ73V4AxZr0hf1GSqLsQ1IwAaPdIQ/viewform) or talk to us in person at the meetup.

Real-World Lossless Compression