Past Meetup

Deepdive in to Spark SQL

This Meetup is past

70 people went

Location image of event venue


Hi All, we're happy to announce another Spark meetup. This time we'll do a deepdive into Spark SQL.
Databricks is our sponsor for this event, and is sponsoring the food and location.
18:00: Arrive, mingle, food (pizza), drinks etc.
18:45: Correctness and Performance of Apache Spark SQL by Nico Poggi and Bogdan Ghitt
In this talk, we present a comprehensive framework we developed at Databricks for assessing the correctness, stability, and performance of our Spark SQL engine. Apache Spark is one of the most actively developed open source projects, with more than 1200 contributors from all over the world. At this scale and pace of development, mistakes bound to happen. We will discuss various approaches we take, including random query generation, random data generation, random fault injection, and longevity stress tests. We will demonstrate the effectiveness of the framework by highlighting several correctness issues we have found through random query generation and critical performance regressions we were able to diagnose within hours due to our automated benchmarking tools.
19:45: An Introduction to Higher Order Functions in Spark SQL by Herman van Hovell
Nested data types offer Apache Spark users powerful ways to manipulate structured data. In particular, they allow you to put complex objects like arrays, maps and structures inside of columns. This can help you model your data in a more natural way. While this feature is certainly useful, it can quite bit cumbersome to manipulate data inside of complex objects because SQL (and Spark) do not have primitives for working with such data. In addition, it is time-consuming, non-performant, and non-trivial. During this talk we will discuss some of the commonly used techniques for working with complex objects, and we will introduce new ones based on Higher-order functions. Higher-order functions will be part of Spark 2.4 and are a simple and performant extension to SQL that allow a user to manipulate complex data such as arrays.
21:00: End of the meetup/everybody out
Hope to see you there, Niels