From Kafka to Iceberg: High-Throughput Ingestion at Starburst
1 attendee from 6 groups hosting
Details
Join us for a focused, 45-minute online meetup which we’ll walk through our architecture, the key design decisions, and the lessons learned in building for scale. The result: a fully managed, easy-to-operate system benchmarked at 100 GB/s throughput, delivering cost-effective, near-real-time ingestion with exceptional query performance.
From Kafka to Iceberg: High-Throughput Ingestion at Starburst
Real-time analytics is critical for modern businesses — but bridging the gap between fast-moving Kafka streams and query-ready lakehouse tables remains a complex challenge. At Starburst, we encountered this firsthand while ingesting our internal telemetry data into Iceberg tables. Existing solutions fell short, plagued by issues such as lack of exactly-once guarantees, limited scalability, head-of-line blocking, small file proliferation, and high operational overhead and cost.
This prompted us to rethink streaming ingestion from the ground up. Our goal was a fully managed, highly available system that makes data actionable within minutes, optimized for both performance and usability. We designed and built a custom Kafka-to-Iceberg ingestion service that directly writes data in Iceberg format with strong guarantees, minimal latency, and continuous data maintenance for optimal query performance. Along the way, we developed novel techniques — such as Iceberg-aware commit coordination and adaptive Kafka consumer assignment — to overcome typical ingestion bottlenecks and deliver best-in-class price/performance.
Speaker: Lakshmikant (Pachu) Shrinivas, Staff Software Engineer at Starburst, hosted by Lester Martin, Developer Advocate at Starburst.
