Presto, an Open Source SQL Engine for Big Data

Details
http://photos2.meetupstatic.com/photos/event/6/2/c/e/600_441985294.jpeg
Title: Presto, an Open Source SQL Engine for Big Data
Presenters: Facebook and Teradata
Talks:
Presto at Facebook
Martin Traverso, Dain Sundstrom (Facebook – Menlo Park, CA)
Presto is a distributed SQL engine, originally for the massive Hadoop warehouse at Facebook. It has grown to become a critical piece of infrastructure used throughout the organization to solve a multitude of problems. In this talk we will outline some of the use cases and how we have applied the Presto technology to address their unique requirements. As part of this work, we have built Raptor, a new columnar data store For Presto. We will discuss its architecture and design.
Hello, Enterprise! Meet Presto.
Christina Wallin (Teradata – Boston, MA)
Teradata has been hard at work on Presto, and we want to share with you what we've done so far and our roadmap going forward. From presto-admin, a tool for installing and administering Presto, to YARN/Ambari support, to fully certified JDBC and ODBC drivers, we are committed to making Presto the best, most enterprise-ready SQL-on Hadoop solution out there.
Tempto & Benchto
Lukasz Osipiuk, Karol Sobczak (Teradata – Warsaw, Poland)
Tempto is a product test framework that allows developers to write and execute tests for SQL databases running on Hadoop. Individual test requirements such as data generation, HDFS file copy/storage of generated data and schema creation are expressed declaratively and are automatically fulfilled by the framework. Developers can write tests using Java (using a TestNG like paradigm and AssertJ style assertion) or by providing query files with expected results. We will show how we use it for presto product tests.
Benchto is a benchmark framework that provides an easy and manageable way to define, run and analyze macro benchmarks in clustered environment. Understanding behavior of distributed systems is hard and requires good visibility intostate of the cluster and internals of tested system. This project was developed for repeatable benchmarking ofHadoop SQL engines, most importantly Presto.
Designing An Evolving Database Service with Presto
Taro L. Saito (Treasure Data, Inc. – Mountain View, CA)
Treasure Data simplifies event analytics for the complex digital world. Our customers send us 1,000,000 events per second and issue 30,000+ Presto queries everyday to understand their customers better. One of the challenges is designing a cloud database with zero downtime to support a global customer base. We have achieved this goal by developing several open-source technologies; Fluentd and Embulk enable seamless log collection from stream/batch sources, and with MessagePack we can provide an extensible columnar store that accommodates future schema changes. Finally, Presto allows us to serve a wide variety of data processing our customers perform on our service. In this talk, I will present a very brief overview of our system, and how our customers keep using Presto while collecting and extending their data set.
Presto at Cogo Labs
Brian Kinney (Cogo Labs – Cambridge, MA)
Presto is the most performant distributed query engine we have used but does have some limitations and technical hurdles. I'll share some of our experience deploying Presto on EMR for SQL access to data on S3 and troubleshooting use cases that cause trouble.

Presto, an Open Source SQL Engine for Big Data