Experience building an index in a Data Lakehouse, with Paola Pardo


Details
Happy new year, Sparklers!
We hope you all had an excellent start of 2022 and seeking new challenges. Two months have gone by since our last meeting, where we promised to be more active... So here’s the first step forward!
This February we are inviting you to what will be the first offline event in almost 2 years! This time, our co-organizer Paola Pardo will share again her knowledge with a talk named "Experience building an index in a data lakehouse"
See you Wednesday 23rd of February 19:00 @ Attico Verdaguer
We want to thank our two sponsors for this event:
- Attico Workspaces for offering their amazing venue in their support for an innovative, entrepreneurial and creative community!
- Qbeast will delight us with drinks and pizzas!
Don't miss it!
Abstract:
The Big Data ecosystem is moving towards a Data Lakehouse architecture. The best of Data Lake and Data Warehouses are combined to offer a needed metadata management layer to the storage. At Qbeast, we built an extension that brings functionalities such as multi-column indexing and efficient sampling to your data lakehouse.
In this talk, we will deep dive into the internals of the open-source implementation based on Apache Spark and Delta Lake: Qbeast-spark. We will explain how the Qbeast Format organizes the data and answers a query using only the metadata insights. And, of course, the different optimization problems we have faced in the development!
Bio:
Paola Pardo is one of the co-founders of Qbeast, a spin-off of the Barcelona Supercomputing Center that uses a patented indexing technology to store and query big data more efficiently. She developed big data software at the BSC before joining the Qbeast team and graduated from the UPC with a thesis focused on Data storage push-down optimization for Apache Spark. She is currently developing Qbeast-Spark and advocating for open source technologies that help the growth of data analytics, data science, and data engineering.
COVID-19 safety measures

Experience building an index in a Data Lakehouse, with Paola Pardo