Interactive Analytics in the Cloud with Presto and Alluxio


This Alluxio Meetup ( features a chance to interact with other Alluxio ( users and developers, as well as three talks. Thanks to our joint host Silicon Valley Cloud Computing group (!

6:00pm: Happy Hour and networking
6:30pm: Building Fast SQL Analytics on Anything with Presto, Alluxio
7:10pm: Building Cloud-native Analytical Pipelines on AWS
7:30pm: Into the Cloud: Twitter's Presto Journey to GCP
8:00pm: Q&A & Mingle

Talk 1: Building Fast SQL Analytics on Anything with Presto, Alluxio

This talk describes a stack to combine Presto, Alluxio, and Cloud object storage systems (e.g.,AWS S3) for high-concurrent and low-latency SQL queries on big data on the cloud. Presto, an open-source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Alluxio is an open-source data orchestration that brings data closer to compute and provides a unified data access layer at in-memory speeds. Presto can use Alluxio as a distributed caching tier on top of S3 for the hot data to query, avoiding reading data repeatedly from the cloud.

This talk will cover:
- the architecture of Presto, its separation of compute and storage, cloud-readiness, recent advancements in the project such as Cost-Based Optimizer and Kubernetes Support.
- an overview of Alluxio’s key concepts, architecture and data flow,
- Presto and Alluxio production use cases running hundreds of nodes, including ING Bank,, and NetEase Games.

Kamil Bajda-Pawlikowski, CTO, Starburst
Kamil is a technology leader in the large scale data warehousing and analytics space. He is CTO of Starburst, the enterprise Presto company. Prior to co-founding Starburst, Kamil was the Chief Architect at the Teradata Center for Hadoop in Boston, focusing on the open source SQL engine Presto. Previously, he was the co-founder and chief software architect of Hadapt, the first SQL-on-Hadoop company, acquired by Teradata in 2014.

Bin Fan, founding engineer and VP of Community, Alluxio
Bin Fan is the founding member of Alluxio, Inc. and the PMC maintainer of Alluxio open source project. Prior to Alluxio, he worked for Google. Bin received his Ph.D. in CS from CMU.

Talk 2: Building Cloud-native Analytical Pipelines on AWS

With the ease and flexibility that the cloud brings, many data platform teams are building their data pipelines on Amazon AWS leveraging many of the services it provides. For frameworks like Apache Spark and Hive, Amazon EMR that includes the Hadoop stack, greatly simplifies and speeds up the installation and configuration of clusters. Amazon S3 also provides a cost-effective and easy way to store large amounts of data. However, there are still challenges that data engineers see with workloads that are latency sensitive, need data sharing across pipelines, or need constant synchronization with S3.

In this talk, Irene will share her experience with building data pipelines on AWS and how Alluxio, a data orchestration layer can greatly simplify these challenges while eliminating problems caused by S3 throttling or slowdowns.

Irene Cai is a software engineer at Google, working in Google Brain team on TensorFlow and TFX Fleetwide metrics. She previously worked at Amazon for four years where she worked on big data pipelines and applications that process hundreds of TBs of data daily.

Talk 3: Into the Cloud: Twitter's Presto Journey to GCP

Hao from Twitter will share Twitter's cloud journey from performance requirements to authentication and authorization.

Hao Luo is a Sr. Software Engineer focusing on interactive query and real time computing @ Twitter.