Extending Spark for Qbeast's SQL DataSource, with Paola & Cesare


Detalles
Hi!
Buckle up for the next meetup!
This time we will hear from the guys (and girls) of Qbeast, a recent spin-off of the Barcelona Supercomputing Center.
Are you interested in Spark's internal and how to take advantage of the lastest storage technology? Then, see you on the 24th of October, 19:00 at LifullConnect (Trovit) offices! Don't miss it!
Title:
Extending Spark for Qbeast's SQL DataSource
Abstract:
One of the key strengths of Spark is its flexibility as it integrates with dozens of different storage systems and file formats. However, it is not the same reading from a CSV file, or a SQL database, or an exotic stratified sampled multidimensional database. And finding the right balance between modularity and flexibility is not easy!
In this presentation, we will talk about the evolution of Spark's DataSource API, and how it integrates with the SQL optimizer, highlighting how we can make much faster queries with logical and the physical plans that better integrates with the storage. From theory to practise, we will then discuss how we extended the Spark's internals, and we built a new source integration that allows the push-down of both sampling and multidimensional filtering.
About the speakers:
Paola Pardo is a Computer Engineer from Barcelona. She graduated in Computer engineer this last summer at the Technical University of Catalunya with a thesis focused on Data storage push down optimization based on Apache Spark. She is, and she is currently working at Barcelona Supercomputing Center and in its spin-off Qbeast developing a Qbeast-Spark connector.
Cesare Cugnasco is a PhD in Computer Architecture and a researcher at the Barcelona Supercomputing Center. His research focuses on NoSQL databases, distributed computing and High-performance storage. He invented and patented a new database architecture for Big Data, and he is building a spin-off for its commercialization.

Extending Spark for Qbeast's SQL DataSource, with Paola & Cesare