What we're about

Apache Pinot is a realtime distributed OLAP datastore, which is used to deliver scalable real time analytics with low latency. It can ingest data from batch data sources (S3, HDFS, Azure Data Lake, Google Cloud Storage) as well as streaming sources (such as Kafka). Pinot is used extensively at LinkedIn and Uber to power many analytical applications such as Who Viewed My Profile, Ad Analytics, Talent Analytics, Uber Eats and many more serving 100k+ queries per second while ingesting 1Million+ events per second.

Pinot committers are active on slack. Click here (https://communityinviter.com/apps/apache-pinot/apache-pinot) to join Apache Pinot slack channel.

This meetup is for developers and users of Apache Pinot to share information on

• How to use Pinot

• Internals of Pinot

• Products built on top of Pinot

More info on Pinot

• Apache Pinot Website (http://pinot.apache.org)

• Apache Pinot Docs (https://pinot.readthedocs.io/en/latest/intro.html)

Blog posts

https://engineering.linkedin.com/blog/2019/03/pinot-joins-apache-incubator

https://engineering.linkedin.com/blog/2019/06/star-tree-index--powering-fast-aggregations-on-pinot

https://engineering.linkedin.com/blog/2019/auto-tuning-pinot

• Pinot at Uber (https://eng.uber.com/restaurant-manager/)

Upcoming events (5)

Advanced Indexing: Json and Text

Online event

----------------------------------------
TALK 1: Plug and Play with Apache Pinot Json Index
----------------------------------------
When ingesting data from an event stream (such as Kafka), the source events can be stored as nested or unstructured records. In order to ingest these records into a structured data store for further analysis, one common problem is to flatten and extract fields from the records. Usually that is done by setting up another stream processing job (e.g. Flink) to pre-process the stream and produce a new stream with structured records. This requires users to maintain a separate job and system, which is quite a heavy overhead and bad experience especially when users want to try out some use cases quickly.
With Apache Pinot Json Index, these unstructured records can be directly consumed and stored as json strings, and Pinot can automatically flatten the records and build an index on top of them to accelerate the value lookup. Users can enjoy a plug and play experience with impressive performance, no longer worrying about maintaining another system.

Presented by:
Jackie Jiang
Founding Engineer at StarTree, PPMC and Committer for Apache Pinot

Jackie got his bachelor's degree from Tsinghua University and master's degree from Carnegie Mellon University. Then he started his career at LinkedIn for 4 years and became the PPMC and one of the top contributors for Apache Pinot. Jackie's goal is to make Apache Pinot the fastest online analytics platform in the market.

----------------------------------------
TALK 2: Text Indexing in Pinot
----------------------------------------
Pinot supports super fast query processing through its indexes on non-BLOB like columns. Queries with exact match filters on terms are run efficiently through a combination of our highly optimized native storage structures such as dictionary encoding, inverted index and sorted index. What if the user is interested in doing arbitrary text search instead of exact matches? Pinot supported this through the in-built function REGEXP_LIKE. Unlike exact matches, indexes can’t be used to evaluate the regex filter and we resort to full table scan which becomes inefficient. For arbitrary text data which falls into the BLOB/CLOB territory, we need more than exact matches. Users are interested in doing regex, phrase and fuzzy queries on BLOB like textual data. To efficiently handle such queries, Pinot added support for text indexes on STRING columns where each column value can be a blob of heterogeneous text. In this talk, we will go into the design, implementation of text index support, challenges encountered, future work, performance numbers along with insight into how we are using it at huge scale within LinkedIn.

Presented by:
Siddharth Teotia
Senior Software Engineer @ LinkedIn, PPMC Apache Pinot, PMC Apache Arrow

Siddharth works at LinkedIn in the Pinot team part of Systems and Infrastructure group Prior to LinkedIn, he worked at Oracle for 3.5 years in the Database kernel group on storage, indexing and in-memory columnar query processing. Prior to Oracle, Siddharth worked at Dremio for 2 years as one of the early engineers building out the distributed data lake query engine. He is also a PMC member for Apache Arrow and has previously given talks at multiple conferences and meetups

----------------------
Watch Live (video URL):
----------------------

https://www.youtube.com/watch?v=TQoXSoKHLp8

Pinot vs Elasticsearch, a Tale of Two PoCs

Online event

----------------------------------------
TALK 1: Pinot vs Elasticsearch, a Tale of Two PoCs
----------------------------------------
In this talk, Ken will describe how they initially tried to use Elasticsearch to provide ad hoc analytics on a large dataset for one of their clients, why that failed, and how they were ultimately able to solve the problem using a combination of Flink and Pinot.

Presented by:
Ken Krugler
President at Scale Unlimited and ASF member/committer for Apache Tika

Ken & his consulting company help clients solve big data problems using Hadoop, Cassandra, Flink, Solr, Elasticsearch and Pinot. He’s the past founder and CTO of Krugle, a vertical search engine for code and technical information. Prior to that he worked for Steve Jobs on the original Macintosh, and then pioneered MacOS support for Japanese, Chinese, Korean, Thai, Tibetan and other languages.

----------------------------------------
TALK 2: Forward index reader performance improvement
----------------------------------------
Pinot supports super fast query processing through its indexes on non-BLOB like columns. Queries with exact match filters on terms are run efficiently through a combination of our highly optimized native storage structures such as dictionary encoding, inverted index and sorted index. What if the user is interested in doing arbitrary text search instead of exact matches? Pinot supported this through the in-built function REGEXP_LIKE. Unlike exact matches, indexes can’t be used to evaluate the regex filter and we resort to full table scan which becomes inefficient. For arbitrary text data which falls into the BLOB/CLOB territory, we need more than exact matches. Users are interested in doing regex, phrase and fuzzy queries on BLOB like textual data. To efficiently handle such queries, Pinot added support for text indexes on STRING columns where each column value can be a blob of heterogeneous text. In this talk, we will go into the design, implementation of text index support, challenges encountered, future work, performance numbers along with insight into how we are using it at huge scale within LinkedIn.

Presented by:
Jackie Jiang
Founding Engineer at Stealth Startup, PPMC and Committer for Apache Pinot

Jackie got his bachelor's degree from Tsinghua University and master's degree from Carnegie Mellon University. Then he started his career at LinkedIn for 4 years and became the PPMC and one of the top contributors for Apache Pinot. Jackie's goal is to make Apache Pinot the fastest online analytics platform in the market.

----------------------
Watch Live (video URL):
----------------------

TBD

Apache Pinot Concepts: Data Security

Online event

----------------------------------------
Description
----------------------------------------
Serving awesome realtime analytics to your end-users great, and you want to keep their data safe. pinot comes with a number of essential security features out of the box which we'll highlight in this talk.

Presented by:
Alex Pucher
Senior Software Engineer at Stealth Startup

----------------------
Watch Live (video URL):
----------------------

TBD

Trino Connector for Apache Pinot

Online event

----------------------------------------
TALK 1: Pinot Trino Connector
----------------------------------------
Usage and features of the Pinot Trino Connector

Presented by:
Elon
Elon is a Software Engineer at a stealth startup. Before that, he was a software engineer at Facebook on the Presto team. Prior to this, he was a DBA/Software Engineer in New York. Originally, Elon is from New Jersey, who to New York, and finally, California. In his free time, he enjoys bike riding, going to the beach, and playing classical and surf guitar.

----------------------------------------
TALK 2: gRPC streaming server for large scale selection queries
----------------------------------------

Presented by:
Jackie Jiang
Founding Engineer at Stealth Startup, PPMC and Committer for Apache Pinot

Jackie got his bachelor's degree from Tsinghua University and master's degree from Carnegie Mellon University. Then he started his career at LinkedIn for 4 years and became the PPMC and one of the top contributors for Apache Pinot. Jackie's goal is to make Apache Pinot the fastest online analytics platform in the market.

----------------------
Watch Live (video URL):
----------------------

TBD

Past events (13)

Intro to Apache Pinot

Online event

Photos (20)