Apache Pinot Features: Derived Column & Pinot managed Realtime to Offline flows

Name: Apache Pinot Features: Derived Column & Pinot managed Realtime to Offline flows
Start: 2021-07-27T09:00:00-07:00
End: 2021-07-27T10:00:00-07:00

Hosted By Real-Time Analytics with Apache Pinot™ by StarTree

public group

Apache Pinot Features: Derived Column & Pinot managed Realtime to Offline flows

Details

----------------------------------------
TALK 1: User-Defined Function (UDF)

User-Defined Function (UDF) makes Pinot very flexible, but at the cost of worse performance due to the extra row-by-row computation and lack of indexes. Derived Column is introduced to help pre-materialize the UDF computation so that it can achieve the best performance but still maintain great flexibility. Further more, it can be configured and generated on top of the existing data without downtime.

Presented by:
Jackie Jiang
Founding Engineer at StarTree,
Apache Pinot PMC and Committer

Jackie is a Founding Engineer at StarTree. Before that, he worked at LinkedIn Pinot Team for 4 and half years and became the PMC and one of the top committers for Apache Pinot. Jackie's goal is to make Apache Pinot the fastest online analytics platform on the market.

----------------------------------------
TALK 2: Pinot managed real-time to offline flows

There are several differences in the activities done by a Pinot offline server and real-time server. Offline servers -used by offline tables- simply download externally made segments and serve queries. Ingestion happens via batch jobs, typically at a hourly/daily frequency, naturally creating time boundary aligned segments. On the other hand, real-time servers -used by real-time tables- have to consume events, keep them in-memory, index events, periodically build segments, and serve queries off the segments as well as in-memory data. Additionally streaming ingestion creates completely time misaligned segments.

These differences make management of realtime tables relatively complex compared to an offline table, and certain operations -such as backfill, aggregations, dedup on a time range- completely impossible.
So what do you do if you have a long retention real-time table, and the above operations are important to you? Typically, you’d want to setup a hybrid table -real-time table for recent data, offline table for long retention data. Does that mean you now HAVE to maintain both the streaming and batch ingestion jobs?

With Pinot managed real-time to offline flows, you don’t have to. Simply setup your streaming ingestion and Pinot will manage the rest.

Join us for this talk to find out how this works!

----------------------------------------
Presented by:
Neha Pawar
Founding Engineer at StarTree
Apache Pinot PMC and Committer

Prior to her current role as a Founding Engineer at a Stealth Mode Startup, Neha worked at LinkedIn as a Senior Software Engineer in the Data Analytics Infrastructure org. Neha is an Apache Pinot PMC and Committer & has made numerous impactful contributions to the Apache Pinot project. She actively fosters the growing Apache Pinot community & loves to evangelize Apache Pinot in the form of blogs, video tutorials, speaking in meetups and conferences. You can find her on Twitter at @nehapawar18

Events in

Real-Time Analytics with Apache Pinot™ by StarTree

See more events

Real-Time Analytics with Apache Pinot™ by StarTree

Online event

This event has passed

Real-Time Analytics with Apache Pinot™ by StarTree

public group

Apache Pinot Features: Derived Column & Pinot managed Realtime to Offline flows

Apache Pinot Features: Derived Column & Pinot managed Realtime to Offline flows

Details

---------------------------------------- TALK 1: User-Defined Function (UDF)

---------------------------------------- TALK 2: Pinot managed real-time to offline flows

----------------------------------------
TALK 1: User-Defined Function (UDF)

----------------------------------------
TALK 2: Pinot managed real-time to offline flows