Kafka and lakeFS: Deep Dive into Data Infrastructure


Details
lakeFS is teaming up with Apache Kafka® to host another awesome event! Join us on Feb 16th at 6pm PST to learn how you can improve your Data Pipelines!
-----
Agenda in Pacific Daylight Time:
6:00 pm - 6:20 pm: Networking
6:20 pm - 6:50 pm: Vinodhini SD, Developer Advocate at lakeFS
6:50 pm - 7:20 pm: Lucia Cerchie, Developer Advocate at Confluent
7:20 pm - 7:45 pm: Additional Q&A & Networking
-----
Talk 1:
Forget ETL or ELT or EtLT. Are you doing this T(esting), right?
Speaker:
Vinodhini SD, Developer Advocate at lakeFS
Abstract:
A property of ETL or ELT or EtLT pipelines one might observe is that they rarely stay still. Instead, there are near-constant updates to some aspect of the infrastructure they run on, or in the logic they use to transform data. To efficiently apply the necessary changes to a pipeline requires running it parallel to production to test the effect of a change. Most data engineers would agree that the best way to do this is far from a solved problem.
ETL testing requires a data-centric testing approach as opposed to software application testing. To effectively run ETL tests with high test coverage, we need production-like data. However, copying part or all of the prod data outside of the production environment is a high-risk play.
Enter lakeFS! You can build a comprehensive ETL testing strategy using the open-source data versioning engine – lakeFS. lakeFS allows zero-copy cloning of prod data into a test data environment through a git-like interface.
In this session, you will learn how to use lakeFS to quickly set up a dev/test data environment and to build a bulletproof ETL/ELT/EtLT testing strategy.
-----
Talk 2:
Let's Get Started With Apache Kafka
Speaker:
Lucia Cerchie, Developer Advocate at Confluent
Abstract:
Over a third of respondents to a StackOverflow survey professed to a dread of learning Apache Kafka. Nevertheless, with a curious mindset and the right resources, we have the tools to succeed in learning Kafka. Take the plunge with me; together, we will conquer concepts like events and topics, producers, and consumers.
We'll gain confidence through learning about partitions and brokers and how to use Kafka in the cloud. We'll go over the different configurations for producers and consumers, and how these configurations affect application behavior. Then, accompany me on a code walkthrough and see how we build and run producers and consumers in Python with the Kafka-python client.
You'll be leaving with a spark of excitement, knowing you have neutralized your dread, and that you are now firmly within the two-thirds of StackOverflow respondents who are comfortable learning Apache Kafka.
----
➡️ Join the lakeFS Slack community: https://lakefs.io/slack
- lakeFS is an open source data version control for data lakes.
- It enables zero copy Dev / Test isolated environments, continuous quality validation, atomic rollback on bad data, reproducibility, and more.
- Learn more: https://lakefs.io/
COVID-19 safety measures

Kafka and lakeFS: Deep Dive into Data Infrastructure