Real-time Analytics meetup London

4.8•4 ratings

About us

The Real-Time Analytics meetup covers a range of topics around building Real Time Analytics systems; including use cases, technical deep dives, and best practices.
Interested in speaking, organizing, or volunteering? Contact community@startree.ai

This meetup is organized by the founders of StarTree and original creators of Apache Pinot:
Apache Pinot is a realtime distributed OLAP datastore, used to deliver scalable real time analytics with low latency. It can ingest data from batch data sources (S3, HDFS, Azure Data Lake, Google Cloud Storage) as well as streaming sources (such as Kafka). Pinot is used extensively at LinkedIn and Uber to power many analytical applications such as Who Viewed My Profile, Ad Analytics, Talent Analytics, Uber Eats and many more serving 200k+ queries per second while ingesting 1Million+ events per second.

Resources
> • What is Apache Pinot? https://www.startree.ai/what-is-apache-pinot
> • Launching At LinkedIn: The Story of Apache Pinot: https://www.startree.ai/blog/launching-at-linkedin-the-story-of-apache-pinot
> • For more info on Apache Pinot go to dev.startree.ai
> •Our community is active on slack! To join our slack, go to stree.ai/slack

Upcoming events

See all

Network event
Webinar: Bringing Apache Pinot’s Query Efficiency to the Data Lake
Thu, Jul 23 · 1:00 PM BST
·
Online
Online
10 attendees from 10 groups
To attend, register here.

The era of brute-force analytics is over. Bringing Apache Pinot’s query efficiency to the data lake means changing how analytical queries access data, not just where that data is stored.

Open table formats changed the lakehouse by making storage more flexible and open. But they did not fully fix query economics. For many teams, brute-force analytics is still the default: engines fetch far more data than the question actually requires.

Partition pruning, predicate pushdown, and metadata filtering help, but they often stop at the file or row-group level. What remains still includes unnecessary data access. At lakehouse scale, that extra I/O turns into higher CPU usage, more network movement, and inflated infrastructure spend.

This session looks at a different model: bringing Apache Pinot’s query efficiency to data lake workloads. Instead of reading broad chunks of data to answer selective questions, index-driven execution uses metadata and indexes to narrow data access before unnecessary Parquet data is read — including on open lakehouse tables such as Apache Iceberg and Delta Lake.

In this technical discussion, you’ll learn how to:
- Reduce unnecessary reads
  See where lakehouse queries fetch more data than they need, and why that waste compounds as query volume grows.
- Understand Pinot efficiency
  Learn how Apache Pinot’s indexing and execution model narrows data access before unnecessary Parquet data is read.
- Compare query cost drivers
  Learn how to evaluate bytes fetched, CPU usage, latency, and cost per query to estimate the impact on your own lakehouse workloads.
You’ll also get a practical look at how to evaluate this model in a developer environment, using StarTree Tables Dev Edition to connect lakehouse tables, apply indexing, and inspect how query behavior changes when the engine fetches less data.
1 attendee from this group