Bring Presto into the Future
Presto is widely used for data science, business analytics, and operations. Presto's SQL is a main driver for this, as it is ANSI-compliant, easy to ramp-up, and has rich functionality. Given the versatility and flexibility of this software, there is also a huge demand to develop interfaces for other critical data domains like real-time dashboards, stream processing, and large-scale batch computations. We will explore some interesting systems and prototypes to bring Presto to these new domains.
Code of Conduct: Need to sign NDA before attending the event.
Please enter in Creekside Building A, and check in at the lobby.
Parking structure is left of the building.
6:00 pm - 6:30 Doors open and networking
6:30 pm - 6:45 Presto : SQL for Everything - Girish Baliga, Uber
6:45 pm - 7:15 Aria Presto-Adventures in Query Performance - Orri Erling, Masha Basmanova (Facebook)
7:15 pm - 7:45 Neutrino: Presto for Real-Time - Devesh Agrawal (Databricks), Bhavani Sudha (Uber)
7:45 pm - 8:15 Presto on Kubernetes: Query Anything, Anywhere - Matt Fuller (Starburst)
8:15 pm - 8:30 Ultra-fast SQL Analytics using PAS (Presto on Alluxio Stack) - Haoyuan Li, Bin Fan (Alluxio)
8:30 pm - 9:00 Happy Hour
Orri Erling, Masha Basmanova (Facebook)
We discuss what it takes to take Presto to the next level in performance and share experiences and insights gained on this journey. We demonstrate a 2.5x upside in basic star schema query. The application to Facebook workloads begins with a complete rework of table scan and repartitioning, which together represent ca. 65% of all Presto CPU at Facebook. Getting up to 2x win on these operators is a beginning. The journey proceeds to things like running from memory, async IO, smarter work placement etc. We conclude with a vision for taking Presto to a second to none position in big query.
Neutrino: Presto for Real-Time
Devesh Agrawal (Databricks), Bhavani Sudha (Uber)
Neutrino is a port of Presto, optimized for low latency SQL federation over online databases and storage systems like Apache Pinot, AresDB, ElasticSearch, and Apache Cassandra. It is optimized for very low latency dashboards and microservices handling production traffic. Neutrino is completely stateless and can easily scale as needed.
In this talk, we will go through our motivations, design, adoption and future roadmap for Neutrino.
Presto on Kubernetes: Query Anything, Anywhere
Matt Fuller (Starburst)
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Comcast, GrubHub, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.
This talk will explore using Presto across hybrid and multi cloud environment, allowing easy deployment of Presto on RedHat OpenShift Container Platform, Google Kubernetes Engine Azure Kubernetes Service, and Amazon Elastic Container Service for Kubernetes.
Ultra-fast SQL Analytics using PAS (Presto on Alluxio Stack)
Haoyuan Li, Bin Fan (Alluxio)
This talk describes a stack of open-source projects to serve high-concurrent and low-latency SQL queries using Presto with Alluxio on big data in the cloud. Deploying Alluxio as a data orchestration layer to access cloud storage object storage (e.g., AWS S3), this architecture greatly enhances the data locality of Presto with distributed and cross-query caching, thus avoids reading same data repeatedly from the cloud storage.
This talk covers Alluxio’s core concepts, architecture, data flow, and use cases from internet companies like Walmart and JD.com that run this stack of Presto and Alluxio at the scale in production.