Skip to content

About us

The Real-Time Analytics meetup covers a range of topics around building Real Time Analytics systems; including use cases, technical deep dives, and best practices.

Interested in speaking, organizing, or volunteering? Contact community@startree.ai
This meetup is organized by the founders of StarTree and original creators of Apache Pinot:
Apache Pinot is a Rea-time distributed OLAP datastore,  used to deliver scalable real time analytics with low latency. It can ingest data from batch data sources (S3, HDFS, Azure Data Lake, Google Cloud Storage) as well as streaming sources (such as Kafka). Pinot is used extensively at LinkedIn and Uber to power many analytical applications such as Who Viewed My Profile, Ad Analytics, Talent Analytics, Uber Eats and many more serving 200k+ queries per second while ingesting 1Million+ events per second.
Resources
> • What is Apache Pinot? https://www.startree.ai/what-is-apache-pinot
> • Launching At LinkedIn: The Story of Apache Pinot
> • For more info on Apache Pinot go to dev.startree.ai
> •Our community is active on slack! Join our slack

Upcoming events

2

See all
  • Network event
    Iceberg Query Performance at Scale: StarTree vs. Trino vs. ClickHouse Benchmark

    Iceberg Query Performance at Scale: StarTree vs. Trino vs. ClickHouse Benchmark

    ·
    Online
    Online
    37 attendees from 10 groups

    To attend, register here.

    A technical discussion of iceberg query performance benchmark results across 12.2B rows of Parquet data on S3 — including sub-second latency, CPU efficiency, caching behavior, and up to 15x lower cost per query.

    Querying Iceberg data lakes on S3 gives platform teams flexibility, but interactive performance can become difficult to predict as data volumes, and concurrency grow. For teams supporting analytics, the question is not just which engine can query the lake — it is which architecture can deliver low latency without driving up compute, S3 reads, or operational overhead.
    In this technical deep dive, we’ll walk through iceberg query performance benchmark results comparing StarTree, Trino, and ClickHouse on the same 12.2B-row Parquet dataset in S3, covering:

    • Benchmark setup: How the systems were configured, what queries were tested, and how results were measured.
    • Performance results: How each engine performed across latency, caching behavior, and query execution patterns.
    • Resource efficiency: What the benchmark showed about CPU usage, S3 reads, and cost per query on identical infrastructure.
    • Architecture tradeoffs: What the results reveal about scaling real-time analytics on Iceberg without moving or converting data.

    Leave with a clearer understanding of how StarTree, Trino, and ClickHouse compare across practical Iceberg query workloads — and what to consider when designing for sub-second latency, efficient infrastructure usage, and predictable cost at scale.

    • Photo of the user
    • Photo of the user
    2 attendees from this group
  • Network event
    Webinar: Full-Text Search on Apache Iceberg w/ Pinot and Lucene

    Webinar: Full-Text Search on Apache Iceberg w/ Pinot and Lucene

    ·
    Online
    Online
    1 attendee from 10 groups

    To attend, register here.

    While Data Lakehouses like Apache Iceberg provide massive, cost-effective scalability, they are fundamentally designed as scan-heavy engines.

    They lack the sub-second, "needle-in-a-haystack" search capabilities provided by inverted indices found in traditional search engines.

    This session explores how Apache Pinot fills this gap by integrating Apache Lucene segments directly into its distributed serving layer while maintaining the source of truth in Iceberg's Parquet format.

    We will conduct a technical deep-dive into:

    • Segment-to-Parquet Virtualization: Pinot’s segment abstraction onto remote Iceberg/Parquet files without data duplication or heavy re-ingestion.
    • Hybrid Index Pinning: The mechanics of pinning Lucene Inverted and Text Indexes to local NVMe storage on Pinot servers while leaving the raw data blobs on S3.
    • Lucene I/O Orchestration: How the Pinot optimizes query plans to minimize S3 "Time to First Byte" by leveraging metadata-heavy index structures.

Group links

Organizers

Photo of the user StarTree
Badge for StarTree
StarTree

Super Organizer

Members

321
See all