Skip to content

1. Asurion's "GPS" of their data lake; 2. Intro to Apache Iceberg views

Photo of Jason Hughes
Hosted By
Jason H. and Dremio
1. Asurion's "GPS" of their data lake; 2. Intro to Apache Iceberg views

Details

This meetup is a part of a hybrid virtual and in-person meetup, along with attendees virtually and another in-person watch party in Chicago at the same time. The two talks will be presented virtually and broadcasted to a screen at the in-person events with live Q&A from both virtual and in-person attendees.

The in-person attendees will have the opportunity to socialize and grab free food and free drinks before and after the talks.

Meetup event links for the in-person watch parties:

Make sure to register for the virtual event via the zoom link [https://us02web.zoom.us/webinar/register/WN_kQsO5iviSBeuo63pGGDang](https://www.google.com/url?q=https://us02web.zoom.us/webinar/register/WN_kQsO5iviSBeuo63pGGDang&sa=D&source=calendar&ust=1649561523362495&usg=AOvVaw15ztzTdxNBUixsSnaLT7uv)

Agenda (times in EDT):

4:00pm - 4:30pm - Check in and networking for Chicago and New York
4:30pm - 5:00pm - Asurion’s maps of their Data-lake, Asurion’s GPS for their Data-lake
5:00pm - 5:30pm - An Intro to Apache Iceberg views - Eduard Tudenhoefner, OSS Developer, Dremio
5:30pm - 7:00pm - Socializing and networking for Chicago and New York

This meetup will consist of two talks:

DPS(Data Positioning System): The GPS for your data lake - Rajesh Gundugollu, Principal Architect, Asurion.

This presentation is about our internal product called DPS. We do not call it a Data Catalog intentionally because it’s much more than a Data Catalog. It gives users and platform owners everything they need to know about the data in Data Platform all in one place via a simple search driven UI.

We brought together Data Assets, Columns, Data Movement Jobs, Users, Infrastructure, operational data and even documentation and help about Data Platform into one pane of glass. All of this is presented via a very simplified, interactive, and easy to understand interface. Lot of information about Data Assets like lineage, impact analysis, operational metrics, quality metrics, regulatory metrics all come together in one place.
With this presentation, we also want to share how we overcame the Metadata culture hurdle, how we built this ourselves and how we innovated using graph type data model without a graph database etc.

--------------------------------------------

Intro to Apache Iceberg views - Eduard Tudenhoefner, OSS Developer, Dremio

In open architectures, different engines are used for the workload they were designed and work best for. When using multiple different engines on the same datasets, they all need to agree on what the dataset is. Apache Iceberg provides us that capability, and it works well when you primarily have one engine doing the writing and one engine doing the downstream analytics. However, when using multiple engines for downstream user-facing analytics, each engine also needs to use business logic to provide the end user the answer they're looking for.

When using multiple engines for downstream analytics, there are generally three options:

  1. Each engine has their own definition of the business logic on top of these shared tables
  2. Route other engines’ access through a single engine, which technologies like Apache Arrow Flight make more feasible
  3. Centrally define the business logic in a way all engines can make use of. This has generally not been possible for the vast majority of organizations in the past. This is the approach Apache Iceberg views aim to enable.

In this talk, we'll provide an intro to Iceberg views and how they can be useful to you.

Photo of Open Data Lakehouse Meetups - Global group
Open Data Lakehouse Meetups - Global
See more events