Presto Virtual TechTalks (June) - speakers from LinkedIn & Facebook

Presto Meetup
Presto Meetup
Public group

Online event

This event has passed

Details

Greetings Presto community members.

Hope you all had a relaxing Memorial Day weekend ! We are back with our next virtual meetup featuring 2 exciting talks from engineers at LinkedIn and Facebook.

---
The Zoom link will be visible once you RSVP to this event and please use the password[masked] once you sign in to the call.
---

Agenda:
-----------
11:00am -11:05am - Welcome to the meetup

11:05am -11:30am - Extending Presto at LinkedIn with a Smart Catalog Layer (LinkedIn)

11:30am -11:55am - Common Sub Expression Optimization (Facebook)

11:55 am -12:00 pm - Wrap up

Details:
----------

Talk #1 Extending Presto at LinkedIn with a Smart Catalog Layer
Walaa Eldin Moustafa, Staff Software Engineer at LinkedIn

In this talk, Walaa describes how LinkedIn extended its Presto Hive Catalog with a smart logical abstraction layer that is capable of reasoning about logical views with UDFs by using two core components, Coral and Transport UDFs. Coral is a view virtualization library, powered by Apache Calcite, that represents views using their logical query plans. Walaa shows how LinkedIn leverages Coral abstractions to decouple view expression language from the execution engine, and hence execute non-Presto-SQL views inside Presto, and achieve on-the-fly query rewrite for data governance and query optimization. Moreover, he describes Transport UDFs, a framework for defining user-defined functions once, and automatically translating them to native UDF versions of multiple engines such as Presto, Spark, Hive, or data formats such as Avro. Both Coral and Transport UDFs are open-source projects. Learn more about them at https://github.com/linkedin/coral and https://github.com/linkedin/transport.

Talk #2 Common Sub Expression Optimization
Rongrong Zhong, Software Engineer at Facebook

In complex analytics queries, we often see repeated expressions, for example parsing the same JSON column but extracting different fields, elaborate CASE statement with common predicates and different ones. Previously, Presto will compute the same expression many times as they appear in query. With common sub expression optimization, we would only evaluate the same expression once within the same project operator or filter operator. In our workload, we’ve seen 3x improvements on certain queries with expensive common sub expressions like JSON_PARSE. Microbenchmark also shows a consistent ~10% performance improvement for simple common sub-expressions like x + y. In this talk, we will talk about how this is implemented.

Leave a message for this meetup if you have any specific questions me.

Thanks
Amit Chopra
on Behalf of the Presto Foundation Outreach Team
https://prestodb.io/join.html