Real-Time Analytics with Apache Pinot™ by StarTree

4.6•221 ratings

Mountain View, CA, US

About us

Apache Pinot is a realtime distributed OLAP datastore, which is used to deliver scalable real time analytics with low latency. It can ingest data from batch data sources (S3, HDFS, Azure Data Lake, Google Cloud Storage) as well as streaming sources (such as Kafka). Pinot is used extensively at LinkedIn and Uber to power many analytical applications such as Who Viewed My Profile, Ad Analytics, Talent Analytics, Uber Eats and many more serving 100k+ queries per second while ingesting 1Million+ events per second.

Pinot committers are active on slack. Click here to join our slack channel.
This meetup is for developers and users of Apache Pinot to share information on
• How to use Pinot
• Internals of Pinot
• Products built on top of Pinot

More info on Pinot
• Apache Pinot Website

• Apache Pinot Docs
Blog posts

> • https://engineering.linkedin.com/blog/2019/03/pinot-joins-apache-incubator
> • https://engineering.linkedin.com/blog/2019/06/star-tree-index--powering-fast-aggregations-on-pinot
>
> • https://engineering.linkedin.com/blog/2019/auto-tuning-pinot
>
> • Pinot at Uber

Upcoming events

See all

Network event
Webinar: Bringing Apache Pinot’s Query Efficiency to the Data Lake
Thu, Jul 23 · 5:00 AM PDT
·
Online
Online
10 attendees from 10 groups
To attend, register here.

The era of brute-force analytics is over. Bringing Apache Pinot’s query efficiency to the data lake means changing how analytical queries access data, not just where that data is stored.

Open table formats changed the lakehouse by making storage more flexible and open. But they did not fully fix query economics. For many teams, brute-force analytics is still the default: engines fetch far more data than the question actually requires.

Partition pruning, predicate pushdown, and metadata filtering help, but they often stop at the file or row-group level. What remains still includes unnecessary data access. At lakehouse scale, that extra I/O turns into higher CPU usage, more network movement, and inflated infrastructure spend.

This session looks at a different model: bringing Apache Pinot’s query efficiency to data lake workloads. Instead of reading broad chunks of data to answer selective questions, index-driven execution uses metadata and indexes to narrow data access before unnecessary Parquet data is read — including on open lakehouse tables such as Apache Iceberg and Delta Lake.

In this technical discussion, you’ll learn how to:
- Reduce unnecessary reads
  See where lakehouse queries fetch more data than they need, and why that waste compounds as query volume grows.
- Understand Pinot efficiency
  Learn how Apache Pinot’s indexing and execution model narrows data access before unnecessary Parquet data is read.
- Compare query cost drivers
  Learn how to evaluate bytes fetched, CPU usage, latency, and cost per query to estimate the impact on your own lakehouse workloads.
You’ll also get a practical look at how to evaluate this model in a developer environment, using StarTree Tables Dev Edition to connect lakehouse tables, apply indexing, and inspect how query behavior changes when the engine fetches less data.
1 attendee from this group