Skip to content

About us

This group brings together data engineers, analysts, and developers in the Greater Boston area who are building the future of analytics with Trino, Starburst, and the open data stack. Whether you’re running queries across diverse sources or designing the next generation of lakelakehouse clouds, on-prem, or hybrid environments. Together, they power the modern data lakehouse for analytics and AI, enabling open, federated access without complex migrations.
Our meetups explore topics like AI-driven analytics, Iceberg adoption, data governance, and performance at scale, featuring talks from community members, contributors, and industry leaders.
Whether you’re running queries across diverse sources or designing the next generation of lakehouse architectures, this community is where we connect, learn, and share insights—because data grows stronger when it’s connected.

Iceberg Boston Meetup

Iceberg Boston Meetup

Starburst Boston Office, 101 Federal Street, 18th Floor, Boston, MA, US

Hear real-world stories and best practices from Starburst, Telmai, Akamai, and Microsoft as well as connect with fellow data engineers, practitioners, and community members.
Join us for a day of great talks, great company, and dinner! This meetup is sponsored by Starburst, Telmai, and Akamai.

Seats are limited, RSVP to secure your spot!

Agenda
5:00 – 6:00PM | Arrive, dinner, and network
6:00 - 6:20PM | Starburst - From Kafka to Iceberg: High-Throughput Ingestion at Starburst
6:20 - 6:40 PM | Telmai - Closing the Observability Gap in Iceberg-Native Lakehouses
6:40 – 7:00PM | Akamai - Building Egnatia: How Iceberg Is Saving Us Millions
7:00 - 7:20 PM | Microsoft - Toward Adaptive Lakehouses: Server-Side Planning in Apache Iceberg
7:20 - 9:00PM | Networking

Speakers:
Starburst - Pachu Shrinivas, Staff Software Engineer
Microsoft - Roy Hasson, Sr. Director, Product
Akamai - Endi Caushi, Sr. Software Engineer
Telmai - Max Lukichev, Co-Founder & CTO

***

Starburst:
From Kafka to Iceberg: High-Throughput Ingestion at Starburst

Real-time analytics is critical for modern businesses — but bridging the gap between fast-moving Kafka streams and query-ready lakehouse tables remains a complex challenge. At Starburst, we encountered this firsthand while ingesting our internal telemetry data into Iceberg tables. Existing solutions fell short, plagued by issues such as lack of exactly-once guarantees, limited scalability, head-of-line blocking, small file proliferation, and high operational overhead and cost.

This prompted us to rethink streaming ingestion from the ground up. Our goal was a fully managed, highly available system that makes data actionable within minutes, optimized for both performance and usability. We designed and built a custom Kafka-to-Iceberg ingestion service that directly writes data in Iceberg format with strong guarantees, minimal latency, and continuous data maintenance for optimal query performance. Along the way, we developed novel techniques — such as Iceberg-aware commit coordination and adaptive Kafka consumer assignment — to overcome typical ingestion bottlenecks and deliver best-in-class price/performance.

Telmai:
Closing the Observability Gap in Iceberg-Native Lakehouses

Apache Iceberg has become the open table format of choice across every major managed lakehouse environment. Its guarantees around ACID transactions, schema evolution, and time travel have made it the structural foundation teams rely on when building data platforms for analytics and AI. But they address structural integrity, not data reliability. A table can be perfectly consistent and still carry incomplete, stale, or anomalous data that silently impacts downstream analytics and AI workloads.
Most teams instrument data quality detection downstream of the Iceberg layer, through periodic batch scans or post-ETL validation. This creates an observability lag that is particularly costly in high-concurrency environments with heterogeneous write patterns, where data issues propagate faster than scheduled scans can surface them.
Closing this gap requires anchoring observability directly to the Iceberg catalog layer. By using catalog metadata to automatically discover and prioritize high-risk assets, teams can instrument continuous quality monitoring across volume, schema, freshness, and completeness dimensions without manual monitor configuration. Statistical anomaly detection applied natively at this layer, using rolling baselines and change detection against live Iceberg table metrics, produces a trust signal that is current at the moment downstream consumers act on it.

Akamai:
Building Egnatia at Akamai: How Iceberg Is Saving Us Millions

Apache Iceberg is the de facto table format for modern lakehouse architectures, but its real value emerges when it supports diverse workloads. In this talk, we share how Akamai built Egnatia, an Iceberg platform that delivers seven-figure annual cost savings.

We’ll also share key architectural decisions and lessons learned running Iceberg across batch and streaming workloads.

You’ll learn how we run two demanding workloads on Apache Iceberg:

  • CDN batch analytics: Upsert-heavy, join-intensive pipelines with complex lineage
  • Security event data: Faced with rising Splunk costs, we migrated security telemetry to Iceberg, enabling near–real-time ingestion of high-volume data, enriched with internal context and used for threat detection, analytics, alerting, and investigation.

Microsoft:
Toward Adaptive Lakehouses: Server-Side Planning in Apache Iceberg

Query optimization has long been tightly coupled to compute engines, creating fragmented behavior and operational complexity. This session introduces server-side planning in Apache Iceberg, a catalog-driven approach that externalizes planning and optimization. We’ll cover its design principles, implementation mechanics, and how it enables workload-aware adaptation, cross-engine consistency, and simplified performance tuning in Iceberg-based Lakehouses.

  • Photo of the user
  • Photo of the user
  • Photo of the user
13 attendees

Upcoming events

1

See all

Group links

Organizers

Super Organizer

Members

586
See all