Real-Time Analytics with Apache Pinot™ by StarTree

4.6•218 ratings

Mountain View, CA, US

About us

Apache Pinot is a realtime distributed OLAP datastore, which is used to deliver scalable real time analytics with low latency. It can ingest data from batch data sources (S3, HDFS, Azure Data Lake, Google Cloud Storage) as well as streaming sources (such as Kafka). Pinot is used extensively at LinkedIn and Uber to power many analytical applications such as Who Viewed My Profile, Ad Analytics, Talent Analytics, Uber Eats and many more serving 100k+ queries per second while ingesting 1Million+ events per second.

Pinot committers are active on slack. Click here to join our slack channel.
This meetup is for developers and users of Apache Pinot to share information on
• How to use Pinot
• Internals of Pinot
• Products built on top of Pinot

More info on Pinot
• Apache Pinot Website

• Apache Pinot Docs
Blog posts

> • https://engineering.linkedin.com/blog/2019/03/pinot-joins-apache-incubator
> • https://engineering.linkedin.com/blog/2019/06/star-tree-index--powering-fast-aggregations-on-pinot
>
> • https://engineering.linkedin.com/blog/2019/auto-tuning-pinot
>
> • Pinot at Uber

Upcoming events

See all

Network event
Webinar: Stop Copying Data for Vector Search
Wed, Aug 19 · 7:00 AM PDT
·
Online
Online
20 attendees from 10 groups
To attend, register here.

The data lake is supposed to be where all your data lives. Yet vector search has traditionally required copying embeddings into a separate vector database—adding duplicate storage, synchronization pipelines, and another system to operate. This webinar explores how that architecture is changing.

Tune in for a technical walkthrough of how Apache Pinot brings vector similarity search directly to Apache Iceberg and Delta Lake. We'll cover the evolution from local Pinot tables to tiered storage on Amazon S3 and finally to lake-native vector search using External Tables, showing how approximate nearest neighbor (ANN) search can run over open table formats without moving your data.

In this technical discussion, you'll learn hot to:
- Run vector similarity search directly on Apache Iceberg and Delta Lake using Apache Pinot and External Tables.
- Understand how HNSW enables fast ANN search and what changes are required to make it work over object storage.
- Combine semantic search with SQL filters in a single query over the same data.
- Evaluate a lake-native architecture for AI retrieval that keeps one copy of your data while simplifying search infrastructure.
3 attendees from this group