Unleashing Real-Time Insights with Apache Doris, Flink, Hudi, and Presto

Hosted By
Yingjun W.

Details
We are excited to have Shirley Hu from Apache Doris and Sagar Sumit from Onehouse to discuss how to build your real-time applications!
AGENDA
- Talk title: Exploring New Frontiers: How Apache Flink, Apache Hudi and Presto Power New Insights and Gold Nuggets at Scale
- Speaker name: Sagar Sumit
- Speaker bio: Sagar Sumit is a Database Engineer at Onehouse and an Apache Hudi committer. He works on Hudi's transactional and execution engine. He is also a contributor to the Presto and Trino projects. He has previously worked on the team that built Amazon Aurora, a relational database built for the cloud that now powers mission-critical applications for AWS customers. He started his career with Oracle GoldenGate, replicating committed transactions across heterogeneous database systems.
- Talk abstract: Data exploration and efficiently processing streaming data at scale can be very challenging. It’s very common that the data shape for streaming data can change as patterns and trends evolve. There’s a deep desire in streaming communities like Flink to incrementally process and discover new insights and patterns on-the-fly at scale. While OLTP systems are good for update heavy use cases and can handle high-volume transactional data, they are not optimized for read-heavy workloads. For instance, they may not support more complex analytical functions required for data exploration because they are unoptimized for scanning large amounts of data required for adhoc queries. Ingesting data into lakehouses can help address data exploration needs, but present challenges with stream processing because of minimal support for data mutability and faster updates.
To further enhance the efficiency of upsert operations, Hudi has introduced a new record-level index, improving upsert speeds by orders of magnitude and accelerating computationally demanding MERGE operations. Building upon this foundation, DBT offers a unified framework that transforms this raw data into refined, trustworthy models. This streamlined data then becomes a fertile ground for Presto, which equips users with robust ANSI SQL capabilities. By combining these 3 technologies, engineers can ensure their analytics are at unprecedented velocities, with no speed limits in sight. In this talk, attendees will learn:
- What is Hudi
- How the record index accelerates MERGE operations
- How you can use DBT to transform raw data
- How Presto supports interactive analytics to power fast queries - Talk title: Introduction to Apache Doris, A Next-Generation Real-Time Data Warehouse.
- Speaker name: Shirley Hu
- Speaker bio: DevRel at Apache Doris
- Talk abstract: This talk focuses on the tech side of Apache Doris. The speaker will introduce the technologies that support the quick performance of the real-time OLAP database, including the data ingestion and updating mechanisms, indexes, elastic scaling, multi-tenancy management, semi-structured data analysis, and data lakehousing capabilities.
ABOUT THE MEETUP GROUP
Serverless Tribe, formerly known as Serverless Singapore, is a non-commercial, community-centric meetup group that delves into subjects like serverless, cloud, data infrastructure, and many more exciting areas.
Yingjun Wu of RisingWave (https://www.risingwave.com/) is the current host. RisingWave is an open-source distributed SQL streaming database. Its primary objective is to reduce the cost and complexity of building real-time stream processing applications.

Serverless Tribe
See more events
Online event
This event has passed
Unleashing Real-Time Insights with Apache Doris, Flink, Hudi, and Presto