Using data engineering best practices to overcome challenges in data management


Details
Event link:
https://us02web.zoom.us/j/85267637965?pwd=cmJyQXFSeHg2Z3VzTWlFOGNUTUppQT09
# AGENDA
18:00 - 18:30 - Mingling and food :)
18:30 - 19:15 - CI/CD for Data - how to enhance data quality while increasing data engineering velocity - Tal Sofer, R&D Team Lead @ lakeFS
19:20 - 20:10 - lakeFS internals: A look under the hood of a versioning engine that scales to billions of objects - Barak Amar, Principal Engineer @ lakeFS
*********************** Note: ***********************
- The event will also be streamed live
- All sessions will be delivered in English
*****************************************************
CI/CD for Data - how to enhance data quality while increasing data engineering velocity
Most data teams have difficulty incorporating best practices like testing changes in isolation, identifying data issues before they get into production, or rolling back in case of quality issues. While data engineering suffers from it, in software engineering these problems are solved. In fact, software engineers have a clear set of best practices and tools that support them throughout the software lifecycle. The good news is that we can apply these best practices to data engineering and overcome the challenges of day-to-day work with data. In this talk, you will learn about industry best practices for data lifecycle management and the open-source tools (such as lakeFS) that help implement them.
lakeFS internals: A look under the hood of a versioning engine that scales to billions of objects
lakeFS is an open-source data version control system designed for data lakes and provides atomicity, rollbacks, and reproducibility - all of these are capabilities that are very much required in modern data lake architecture that relies on object storage. In this session, you'll understand how lakeFS scales its Git-like data model to petabytes of data, across billions of objects - without affecting throughput or performance. We will talk about how lakeFS is built under the hood and the data structure and techniques it relies on.

Using data engineering best practices to overcome challenges in data management