Uber x Apache Pinot Meetup
Details
Apache Pinot is a real-time distributed OLAP datastore, which is used to deliver scalable real-time analytics with low latency. It can ingest data from batch data sources (S3, HDFS, Azure Data Lake, Google Cloud Storage) as well as streaming sources (such as Kafka).
Pinot is used extensively at Uber and LinkedIn to power many analytical applications such as Uber Eats, Who Viewed My Profile, Ad Analytics, Talent Analytics, and many more serving 100K+ queries per second while ingesting 1 Million+ events per second.
Event Details
- This Meetup is a co-hosted by Uber and StarTree
- This Meetup is an in-person event only
- Registration is required for the Meetup. Please RSVP & answer the questions (full name & email address)
- Event location details will be emailed a few days before the event to those who have registered for the event and provided an email address
Event Agenda
5:00 PM - Networking and Snacks
5:30 PM - Opening Remarks
5:40 PM - Talk 1 - Pinot, Why Are You So Fast?
6:10 PM - Talk 2 - Pinot Table Joins at Uber Scale
6:40 PM - Talk 3 - TTL Support for Pinot Upserts
7:10 PM - Closing Remarks
--------
-
Talk 1: Pinot, Why Are You So Fast?
Apache Pinot has transformed how companies do business - from improving customer experience by providing user-facing insights to increasing operational efficiency through internal dashboards and metrics engines. Pinot stands out due to its unmatched ability to support high throughput, low latency OLAP queries on both fresh and historical data. In this talk, we will discuss the architectural decisions that led to this high performance including data layout and pruning optimizations. We will also go through the various indexing techniques employed to speed up different kinds of queries and use cases.
Speaker Bio: Chinmay Soman is a Founding Engineer at StarTree, building real-time analytics solutions at scale. -
Talk 2: Pinot Table Joins at Uber Scale
Uber has been the first known company to run a production use-case with the new Pinot Multistage Engine. In this talk, we will talk about how we did it and where we go from here.
Speaker Bio: Ankit Sultana is a Senior Software Engineer on the Real-Time Analytics team at Uber working on all layers of the platform: Pinot, Neutrino, and some internal services. -
Talk 3: TTL Support for Pinot Upserts
Apache Pinot introduced native support for Upsert in v0.6, enabling users to modify existing records during realtime ingestion. However, current design leads to a high memory utilization in upsert clusters due to increased heap usage for storing primary keys. In datasets with a high cardinality of primary keys, the heap usage on servers often becomes a bottleneck for performance and reliability. To address this, we propose Time-to-Live (TTL) functionality for Pinot Upsert primary keys. By setting a TTL value, primary keys will automatically expire after a designated period and will be removed from upsert metadata, optimizing memory usage and improving performance. In this talk, we aim to provide an overview of upsert snapshots, discuss the implementation of TTL support, and share valuable lessons learned throughout the process.
Speaker Bio: Qiaochu Liu is a Staff Software Engineer on Uber's Real-Time Analytics Team.
