Presto, an open source distributed SQL engine, is being adopted widely for its high concurrency and native ability to query multiple data sources. But as the adoption of cloud object stores like S3 grows and data-powered application demands increase, engineers are looking for even more acceleration and a high-performance architecture.
Kamil from Starburst, and Andrew and Bin of Alluxio, will present how best to leverage Alluxio and Presto for fast SQL in the cloud. You will also learn about real-world use cases at JD.com and NetEase.com.
6:00pm: Happy Hour and networking
6:30pm: Part I - Presto: Fast SQL-on-Anything by Starburst (Kamil Bajda-Pawlikowski)
6:50pm: Part II - Alluxio Overview (Andrew Audibert)
7:10pm: Part III - Presto + Alluxio + Object Store: Architecture, Use Case (Bin Fan)
Part I: Presto: Fast SQL-on-Anything
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Comcast, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores. We will cover the architecture of Presto, its separation of compute and storage, and cloud-readiness. In addition, we will discuss some of the best use cases for Presto, recent advancements in the project such as Cost-Based Optimizer and Geospatial functions as well as the roadmap going forward.
Part II: Alluxio Overview
Alluxio is an open-source distributed file system that provides data ecosystems a unified data access layer at in-memory speed. Alluxio enables compute engines like Spark, Presto, MapReduce, TensorFlow to transparently access different persistent storage systems (including HDFS, S3) while actively leveraging in-memory cache to accelerate data access. As a result, Alluxio simplifies the development and management of big data and ML workloads with lower cost and better performance. Alluxio has more than 900 contributors and is used by over 100 companies worldwide. Andrew will give an overview of Alluxio’s core concepts, architecture, data flow, and production use cases.
Part III: Presto + Alluxio + Object Store: Architecture and Use Case
Cloud object storage systems provide different semantics and performance implications compared to HDFS. Applications like Presto cannot benefit from the node-level locality or cross-job caching when reading from the cloud. Deploying Alluxio with Presto to access cloud solves these problems because data will be retrieved and cached in Alluxio instead of the underlying cloud or object storage repeatedly. Bin will present the architecture to combine Presto with Alluxio with use cases from major internet companies like JD.com and NetEase.com, and their lessons learned to operate this architecture at scale.
Kamil Bajda-Pawlikowski is a technology leader in the large scale data warehousing and analytics space. He is CTO of Starburst, the enterprise Presto company. Prior to co-founding Starburst, Kamil was the Chief Architect at the Teradata Center for Hadoop in Boston, focusing on the open source SQL engine Presto. Previously, he was the co-founder and chief software architect of Hadapt, the first SQL-on-Hadoop company, acquired by Teradata in 2014.
Andrew Audibert is an early member of Alluxio, and a top contributor to the Alluxio project. He has been a core maintainer since early 2016. Prior to Alluxio, he worked for Palantir Technologies. Andrew has a B.S. from CMU.
Bin Fan is the founding member of Alluxio, Inc. and the PMC member of Alluxio open source project. Prior to Alluxio, he worked for Google. Bin received his Ph.D. in Computer Science from CMU working on distributed systems.