Skip to content

Segment Merge and Roll-up Task

Photo of Allison Murphy
Hosted By
Allison M.
Segment Merge and Roll-up Task

Details

While operating Pinot clusters on a large scale, we have observed the challenges in terms of performance, scalability and operability.

  • We have observed a lot of tables that contain too many small segments (i.e. 10s of thousands of segments smaller than 1MB). This puts a lot of stress on CPU resources and Zookeeper and makes Pinot not scalable.
  • We have observed that a lot of use cases store very large raw data and suffer from high disk utilization and bad query performance. In many cases, pre-aggregation can greatly reduce the data size and improve query latency.
  • There is no easy way to resolve the above issues without changing the client-side offline ingestion data computation logic and entire data backfill.

To address the above problems, we have designed and implemented a feature called segment merge and roll-up using Pinot's Minion framework. In this talk, we will deep dive into our designs and show the cool demo of the feature.

----------------------------
Speakers

Seunghyun Lee
Senior Software Engineer @ LinkedIn
PMC Apache Pinot

Seunghyun is a senior software engineer at LinkedIn. He has been working in the Pinot team for 4.5 years and he is a PMC member of Apache Pinot. He contributed a lot of Pinot core features including replica group aware segment routing and custom partitioning. His main interest is to tackle the complex problems with distributed systems.

Jiapeng Tao
Software Engineer @ LinkedIn

Jiapeng Tao, is a software engineer of the Linkedin Pinot team. He graduated from NEU with a master degree in Computer Science last year and has been working at Linked in for almost a year since then. During this time, he implemented broker time based pruner, and worked on segment merge/rollup feature. He is very glad to be able to contribute to open source projects and be a part of the Pinot community.

Photo of Real-Time Analytics with Apache Pinot™ by StarTree group
Real-Time Analytics with Apache Pinot™ by StarTree
See more events