PyData Zurich: Data Table Formats and Efficient Fine-tuning
Details
Talk #1: Data table formats
Dive into the world of data table formats. Explore the intricacies of Iceberg, Delta, and Hudi, and learn how to leverage AWS services to unleash their full potential in managing and optimizing your data operations.
Talk #2: bitsandbytes: A Year in review & FSDP + QLoRA/QDoRA Fine-tuning Walkthrough
This talk shares the story behind bitsandbytes (BNB), an open-source quantization library for PyTorch that makes deep learning economically accessible. BNB evolved from Tim Dettmers’ innovative academic project to a thriving community project with 2 million monthly downloads, 12k+ open projects using the library, 250+ packages depending on it and multiple hardware backends.
Titus worked on integrating BNB with Fully Sharded Data Parallelism (FSDP), enabling fine-tuning of large models on consumer GPUs by combining quantization, low-rank adapters (LoRA) and sharding. This aligns with the GPU-poor movement, making it possible to run very large language models on affordable, multi-GPU consumer hardware, thus reducing the cost of entry into generative AI.
We'll walk through the steps to reproduce such fine-tuning, explain the quantization algorithms, how they’re optimized to exploit GPU architecture and discuss the integration across the various systems involved. The talk will also cover Titus' adventurous personal journey from volunteer (without strong prior background in deep learning) to lead maintainer of a popular deep learning library, hoping to inspire others to dive in.


