Skip to content

About us

Apache Pinot is a realtime distributed OLAP datastore, which is used to deliver scalable real time analytics with low latency. It can ingest data from batch data sources (S3, HDFS, Azure Data Lake, Google Cloud Storage) as well as streaming sources (such as Kafka). Pinot is used extensively at LinkedIn and Uber to power many analytical applications such as Who Viewed My Profile, Ad Analytics, Talent Analytics, Uber Eats and many more serving 100k+ queries per second while ingesting 1Million+ events per second.

Pinot committers are active on slack. Click here to join our slack channel.
This meetup is for developers and users of Apache Pinot to share information on
• How to use Pinot 
• Internals of Pinot 
• Products built on top of Pinot

More info on Pinot
Apache Pinot Website

Apache Pinot Docs 
Blog posts

> • https://engineering.linkedin.com/blog/2019/03/pinot-joins-apache-incubator
> • https://engineering.linkedin.com/blog/2019/06/star-tree-index--powering-fast-aggregations-on-pinot
>
> • https://engineering.linkedin.com/blog/2019/auto-tuning-pinot
>
> • Pinot at Uber

Upcoming events

1

See all
  • Network event
    Webinar: Bringing Apache Pinot’s Query Efficiency to the Data Lake

    Webinar: Bringing Apache Pinot’s Query Efficiency to the Data Lake

    ·
    Online
    Online
    10 attendees from 10 groups

    To attend, register here.

    The era of brute-force analytics is over. Bringing Apache Pinot’s query efficiency to the data lake means changing how analytical queries access data, not just where that data is stored.

    Open table formats changed the lakehouse by making storage more flexible and open. But they did not fully fix query economics. For many teams, brute-force analytics is still the default: engines fetch far more data than the question actually requires.

    Partition pruning, predicate pushdown, and metadata filtering help, but they often stop at the file or row-group level. What remains still includes unnecessary data access. At lakehouse scale, that extra I/O turns into higher CPU usage, more network movement, and inflated infrastructure spend.

    This session looks at a different model: bringing Apache Pinot’s query efficiency to data lake workloads. Instead of reading broad chunks of data to answer selective questions, index-driven execution uses metadata and indexes to narrow data access before unnecessary Parquet data is read — including on open lakehouse tables such as Apache Iceberg and Delta Lake.

    In this technical discussion, you’ll learn how to:

    • Reduce unnecessary reads
      See where lakehouse queries fetch more data than they need, and why that waste compounds as query volume grows.
    • Understand Pinot efficiency
      Learn how Apache Pinot’s indexing and execution model narrows data access before unnecessary Parquet data is read.
    • Compare query cost drivers
      Learn how to evaluate bytes fetched, CPU usage, latency, and cost per query to estimate the impact on your own lakehouse workloads.

    You’ll also get a practical look at how to evaluate this model in a developer environment, using StarTree Tables Dev Edition to connect lakehouse tables, apply indexing, and inspect how query behavior changes when the engine fetches less data.

    • Photo of the user
    1 attendee from this group

Group links

Organizers

StarTree is a Super Organizer

Members

2,045
See all
Photo of the user Kishore Gopalakrishna
Photo of the user Foo Lim
Photo of the user Hiren
Photo of the user Vardan Aroustamian
Photo of the user Gerald Wluka
Photo of the user K e l v i n
Photo of the user Randy Breunling
Photo of the user Sriram Baskaran
Photo of the user AKulkarni
Photo of the user Victor Chugunov
Photo of the user Alex Kin
Photo of the user Mark Chekhanovskiy

Find us also at