Presto Virtual TechTalks (August) - Apache Hudi PMC, AWS, Facebook, Alluxio

Presto Meetup
Presto Meetup
Public group

Online event

This event has passed

Details

Hi Presto Community!

We’re excited to announce the next online meetup in our virtual TechTalk series, featuring talks from engineers from the Apache Hudi PMC, AWS, Facebook, and Alluxio.

The Zoom link will be visible once you RSVP. Please use the password[masked] once you sign into the call.

---
Agenda:
11:00am -11:05am - Welcome & introductions

11:05am -11:30am - PrestoDB and Hudi (Apache Hudi PMC, AWS)

11:30am -11:55am - Optimizing Latency-Sensitive Queries for PrestoDB (Facebook, Alluxio)

11:55 am -12:00 pm - Closing remarks
---

Details:

Talk 1: PrestoDB and Hudi

Speakers:
Bhavani Sudha Saktheeswaran, Software Engineer at Moveworks, Apache Hudi PMC, Ex-Uber
Brandon Scheller, Software Engineer at Amazon Web Services

Apache Hudi is a fast growing data lake storage system that helps organizations build and manage petabyte-scale data lakes. Hudi brings stream style processing to batch-like big data by using primitives such as upserts, deletes and incremental pulls. These features help surface faster, fresher data on a unified serving layer. Hudi can be operated on the Hadoop Distributed File System (HDFS) or cloud stores and integrates well with popular query engines such as Presto, Apache Hive, Apache Spark and Apache Impala.

In this talk we are going to introduce Hudi, discuss different table/query types and how Hudi integrates with Presto to support these queries. We like to share our experience on how this integration has evolved over time and also discuss upcoming file listing and query planning improvements in Presto Hudi queries.

Talk 2: Optimizing Latency-Sensitive Queries for PrestoDB using Alluxio

Speakers:
Rohit Jain, Software engineer at Facebook
Bin Fan, Founding engineer and VP of Open Source at Alluxio

For many latency-sensitive SQL workloads, Presto is often bound by retrieving distant data. In this talk, Rohit Jain from Facebook and Bin Fan from Alluxio will introduce their teams’ collaboration on adding a local on-SSD Alluxio cache inside Presto workers to improve unsatisfied Presto latency.

This talk will focus on:

Insights of the Presto workloads at Facebook w.r.t. cache effectiveness
API and internals of the Alluxio local cache, from design trade-offs (e.g. caching granularity, concurrency level and etc) to performance optimizations.
Initial performance analysis and timeline to deliver this feature for general Presto users.

Leave a message in the meetup group if you have any questions.

cheers
Dipti (on behalf of the Outreach Team)

Dipti Borkar
Chair | Outreach Team | Presto Foundation

https://prestodb.io/
Twitter: @prestodb
Slack: prestodb.slack.com
Join the Presto Foundation: https://prestodb.io/join.html