The Ins & Outs of Data Lakehouse Versioning at the File,Table, and Catalog Level
Details
Data Engineer's Lunch: The Ins & Outs of Data Lakehouse Versioning at the File, Table, and Catalog Level
Unraveling the benefits of Data Lakehouse Versioning and its significance in enhancing data accuracy and reliability.
Data lakehouse versioning is a critical technique for ensuring the accuracy and reliability of data in a data lakehouse. It allows you to track changes to data over time, which can be helpful for troubleshooting problems, auditing data, and reproducing experiments.
This presentation will explore the ins and outs of data lakehouse versioning. We will discuss the different levels of versioning, including catalog, file, and table-level versioning. We will also discuss the benefits of data lakehouse versioning and the pros and cons of each type of versioning.
By the end of this presentation, you will have a better understanding of data lakehouse versioning and how it can be used to improve the accuracy and reliability of your data.
Key takeaways:
- Data lakehouse versioning is a critical technique for ensuring the accuracy and reliability of data in a data lakehouse.
- There are three levels of data lakehouse versioning: catalog, file, and table level versioning.
- Each type of versioning has its own benefits and drawbacks.
- Data lakehouse versioning can be used to troubleshoot problems, audit data, and reproduce experiments.
Bring your lunch and join in. Don't have to leave your desk. Can come as early as 11:45 AM ET/10:45 AM CT to network & catchup.
5-10m Wait for people to get in.
10-15m Volunteer presents/ talks about something they are working on/cool stuff
10-15m Q/A Commentary




