Apache Pinot Contributor Call #5
Details
Join us for our monthly Contributor Office Hours - a casual, open session to:
✅ Present your pull requests, proposals, or ideas
✅ Ask questions and get live feedback
✅ Learn how to get started contributing to Apache Pinot
Whether you’re a seasoned committer or brand new to the project, you’re welcome to join!
Hosts: Robert Zych (Apache Pinot Committers)
Agenda:
8:30AM - 9:15 AM
Title: Scaling Upsert in Pinot with Segment Merge and Commit Time Compaction
Speaker: Tarun Mavani
Summary: This talk will highlight two major optimizations for Upserts in Pinot—Segment Merge and Commit Time Compaction. These enhancements improve the efficiency and reliability of upsert tables in Pinot. It will cover how merging small segments boosts query performance and how commit-time compaction eliminates stale records, achieving up to 8x storage reduction without background jobs.
9:15AM - 9:30 AM
Title: Pinot Lambdaless Solution at LinkedIn
Speaker: Jiapeng Tao
Summary: Hybrid tables (lambda architecture) were traditionally used for time-series data that required a long retention period, consuming both stream data and offline data. In most cases, the offline data is simply the ETL data of the stream, which required Pinot customers to maintain a separate offline pipeline for ingestion.
The Lambdaless (realtime-only) approach simplifies this by reducing the tech debt of maintaining two separate pipelines and table configs, while also shortening onboarding time. This session will cover LinkedIn’s realtime-only solution and how data backfill is handled.
Want to be involved? Join the Contributors channel on Slack >
*The call will be recorded and shared on the channel afterwards^
Join the Community Monthly Newsletter : Everything's Apache Pinot! (and get a chance to win a T shirt!)
