Records, Shredded on Ice: A Primer on Parquet and Iceberg
Details
Steffen R. Knollmann
Presents Records, Shredded on Ice: A Primer on Parquet and Iceberg
Imagine you’re dealing with millions or billions of records, each with many fields that may or may not repeat within or across records. How do you efficiently store and query that data, especially if you are only interested in a few fields at a time? How do you manage updating the dataset, possibly evolving the schema over time?
Parquet solves the first problem: it shreds records into columnar chunks, storing statistics so you can skip irrelevant data without reading everything. Iceberg solves the second: it layers structured metadata atop those files, snapshots, field IDs, partition specs, and table versions.
In this talk I will introduce both pieces of the puzzle, how they work under the hood, and how to use them with Rust using the official crates.
About Steffen
Astrophysicist-turned-software-engineer/architect, metalhead, modular synth enthusiast, Dungeon Master, and sci-fi aficionado. I have been writing code for more than 30 years, 20 of that professionally, the last 14 in finance. Designing and running systems that process large amounts of data efficiently, consistently, and correctly has been a recurrent theme in my career.
Picking up Rust 6 years ago has been a breath of fresh air, and I have not looked back since (but I will use Python when it's the right tool for the job and I have a soft spot for good old C). I have contributed small things here and there to the Rust ecosystem, but there are not enough hours in the day to do all the things I want to do!
And hey, the synths won't patch themselves...
