Presto Meetup

4.7

118 ratings

San Francisco, CA, US

1,650 members · Public group

Organized by Presto Foundation

What we’re about

Presto is a distributed SQL engine for processing very large data sets stored across potentially different storage systems. Our goal is to come together as a community to connect with each other, share experiences and learnings, and discuss the various challenges we have to address our different use cases.

The Presto project is governed by the Presto Foundation, under the Linux Foundation.
Slack: https://slack.prestodb.io/
Youtube: http://youtube.prestodb.io/
Github: https://github.com/prestodb/presto

To have a productive, safe, and fun time in our events, here's our open source code of conduct.

Upcoming events (2)

See all

Thu, 31 July 2025, 2:00 pm PDTHands-on workshop: Building an Open Lakehouse with Apache Hudi™ & Presto
IBM Silicon Valley Lab, San Jose, CA
Attendees must register here: https://prestodb.io/events/july-2025-hudi-presto-workshop/

We’re hosting an in-person workshop in partnership with IBM and Onehouse! Please join us for a few hours of a hands-on workshop and get to know the Presto and Apache Hudi community.

The lakehouse architecture brings together the scalability and flexibility of data lakes with the transactional capabilities traditionally found in data warehouses. This workshop is designed to equip data engineers and architects with the skills to build an open lakehouse architecture using Apache Hudi on AWS S3, with Presto as the engine for fast, interactive querying.

Course outline:

Open Lakehouse architecture stack with Hudi as the lakehouse platform and Presto as the compute engine.

Practical exercises on:

Creating different Hudi table tables (Copy-On-Write and Merge-On-Read)

Ingesting data

Syncing with catalogs such as Hive Metastore

Various ways of querying data using Presto, including snapshot and read-optimized queries

The session will also touch on upcoming features in the Hudi-Presto connector, including support for Hudi 1.0 and new indexing capabilities
8 attendees+3
Thu, 31 July 2025, 5:00 pm PDTIn-person Meetup: Open Source Analytics Festival
IBM Silicon Valley Lab, San Jose, CA
Attendees must register here: https://lu.ma/wnsi90or

Join us for an evening devoted to four of the most popular analytic open source technologies on the planet: ClickHouse®, Presto, Gluten, and StarRocks. We'll have presentations from community experts followed by a panel featuring audience questions to the presenters. There will be refreshments, drinks, and lots of time for networking with other database developers. Join us!

## Presenters:

Robert Hodges (Altinity; speaking on ClickHouse)

Ron Kapoor (StarRocks / CelerData)

Aditi Pandit (Presto / IBM)

Binwei Yang (Apache Gluten / IBM)

Panel Discussion: Ali LeClerc (IBM) will be host and interview the presenters.

## Description of the talks:

Building Cheap, Fast, Scalable Analytics with Open Source ClickHouse® by Robert Hodges, CEO @ Altinity

ClickHouse is the go-to database for processing event streams and delivering user-facing SaaS analytics. This quick overview shows the major features that make ClickHouse so popular for real-time analytics and provides a jumping off point to build your own apps. We'll provide explicit guidance on when to reach for ClickHouse and how to get started. We will also demo new work from Altinity that adds separable compute and storage using shared Apache Iceberg tables. Users can take advantage of cheap object storage to process massive datasets without breaking the bank.

Real-Time Customer-Facing Analytics: From Pain to Production by Ron Kapoor, Developer Advocate @ CelerData

Real-time customer-facing analytics drives growth and engagement—but only if it's fast, fresh, and reliable. In reality, many teams still struggle with:

Queries that take seconds or even minutes when users expect instant results

Latency spikes during peak traffic that break SLAs

Expensive, fragile pre-computation pipelines

Data freshness gaps that confuse users and undermine trust

This talk explores the architectural patterns and open-source technologies that leading teams use to meet the demands of customer-facing workloads: sub-second latency, high concurrency, and real-time updates, without breaking the bank. We’ll share lessons from companies like Pinterest and Demandbase on how they tackled these challenges and what worked and what didn’t. Finally, we’ll look ahead at how open table formats and emerging AI agents are shaping the future of customer-facing analytics and how to build for today’s real-time needs while being ready for what’s coming next.

Pushing the Limits of Query Speed: Presto C++ in Action
Aditi Pandit, Software Engineer @ IBM

Presto C++ (aka Prestissimo) is a high-performance rewrite of the Presto engine in C++, designed to power interactive analytics at massive scale. In this session, Aditi Pandit will walk through recent performance optimizations in Presto C++, including memory and compute efficiency. Backed by benchmarks and real-world results, this talk will show how Presto is closing the gap between open-source flexibility and high performance.

Accelerating Spark Workloads with Apache Gluten and Velox
Binwei Yang, Software Engineer @ IBM

Apache Gluten unlocks native query acceleration for Spark by replacing the JVM-based execution engine with a C++ backend powered by Velox. In this session, Binwei Yang will share how Gluten is delivering up to 6x faster performance on key workloads along with some benchmarks. Get a look at recent optimizations in vectorization, I/O, and operator execution and why native engines are reshaping the future of Spark performance.

Location
IBM Silicon Valley Lab
555 Bailey Ave, San Jose, CA 95141, USA
11 attendees+6