Skip to content

Presto, an Open Source SQL Engine for Big Data

J
Hosted By
Justin B.
Presto, an Open Source SQL Engine for Big Data

Details

http://photos2.meetupstatic.com/photos/event/6/2/c/e/600_441985294.jpeg

Title: Presto, an Open Source SQL Engine for Big Data

Presenters: Facebook and Teradata

Talks:

Presto at Facebook

Martin Traverso, Dain Sundstrom (Facebook – Menlo Park, CA)

Presto is a distributed SQL engine, originally for the massive Hadoop warehouse at Facebook. It has grown to become a critical piece of infrastructure used throughout the organization to solve a multitude of problems. In this talk we will outline some of the use cases and how we have applied the Presto technology to address their unique requirements. As part of this work, we have built Raptor, a new columnar data store For Presto. We will discuss its architecture and design.

Hello, Enterprise! Meet Presto.

Christina Wallin (Teradata – Boston, MA)

Teradata has been hard at work on Presto, and we want to share with you what we've done so far and our roadmap going forward. From presto-admin, a tool for installing and administering Presto, to YARN/Ambari support, to fully certified JDBC and ODBC drivers, we are committed to making Presto the best, most enterprise-ready SQL-on Hadoop solution out there.

Tempto & Benchto

Lukasz Osipiuk, Karol Sobczak (Teradata – Warsaw, Poland)

Tempto is a product test framework that allows developers to write and execute tests for SQL databases running on Hadoop. Individual test requirements such as data generation, HDFS file copy/storage of generated data and schema creation are expressed declaratively and are automatically fulfilled by the framework. Developers can write tests using Java (using a TestNG like paradigm and AssertJ style assertion) or by providing query files with expected results. We will show how we use it for presto product tests.

Benchto is a benchmark framework that provides an easy and manageable way to define, run and analyze macro benchmarks in clustered environment. Understanding behavior of distributed systems is hard and requires good visibility intostate of the cluster and internals of tested system. This project was developed for repeatable benchmarking ofHadoop SQL engines, most importantly Presto.

Designing An Evolving Database Service with Presto

Taro L. Saito (Treasure Data, Inc. – Mountain View, CA)

Treasure Data simplifies event analytics for the complex digital world. Our customers send us 1,000,000 events per second and issue 30,000+ Presto queries everyday to understand their customers better. One of the challenges is designing a cloud database with zero downtime to support a global customer base. We have achieved this goal by developing several open-source technologies; Fluentd and Embulk enable seamless log collection from stream/batch sources, and with MessagePack we can provide an extensible columnar store that accommodates future schema changes. Finally, Presto allows us to serve a wide variety of data processing our customers perform on our service. In this talk, I will present a very brief overview of our system, and how our customers keep using Presto while collecting and extending their data set.

Presto at Cogo Labs

Brian Kinney (Cogo Labs – Cambridge, MA)

Presto is the most performant distributed query engine we have used but does have some limitations and technical hurdles. I'll share some of our experience deploying Presto on EMR for SQL access to data on S3 and troubleshooting use cases that cause trouble.

Photo of Boston Hadoop User Group group
Boston Hadoop User Group
See more events
District Hall
75 Northern Ave · Boston, MA