Kafka and lakeFS: Deep Dive into Data Infrastructure

Name: Kafka and lakeFS: Deep Dive into Data Infrastructure
Start: 2023-02-16T18:00:00-08:00
End: 2023-02-16T20:00:00-08:00
Location: 889 W Evelyn Ave

Hosted by Alice R.

Apache Storm & Apache Kafka

Details

Join us for an Apache Kafka® meetup featuring our friends from lakeFS on February 16th at 6 pm PDT. lakeFS is an open-source data version control tool that transforms object storage into Git-like repositories, offering teams a way to use the same workflows for code and data. To find out when the event is in your time zone, click the WorldTimeBuddy link.

Find the agenda and speaker information below. Join the Community Slack and Forum to ask any follow-up questions!

-----
Agenda in Pacific Daylight Time:
6:00 pm - 6:20 pm: Networking
6:20 pm - 6:50 pm: Vinodhini SD, Developer Advocate at lakeFS
6:50 pm - 7:20 pm: Lucia Cerchie, Developer Advocate at Confluent
7:20 pm - 7:45 pm: Additional Q&A & Networking

-----
Talk 1:
Forget ETL or ELT or EtLT. Are you doing this T(esting), right?

Speaker:
Vinodhini SD, Developer Advocate at lakeFS

Abstract:
A property of ETL or ELT or EtLT pipelines one might observe is that they rarely stay still. Instead, there are near-constant updates to some aspect of the infrastructure they run on, or in the logic they use to transform data. To efficiently apply the necessary changes to a pipeline requires running it parallel to production to test the effect of a change. Most data engineers would agree that the best way to do this is far from a solved problem.

ETL testing requires a data-centric testing approach as opposed to software application testing. To effectively run ETL tests with high test coverage, we need production-like data. However, copying part or all of the prod data outside of the production environment is a high-risk play.
Enter lakeFS! You can build a comprehensive ETL testing strategy using the open-source data versioning engine – lakeFS. lakeFS allows zero-copy cloning of prod data into a test data environment through a git-like interface.

In this session, you will learn how to use lakeFS to quickly set up a dev/test data environment and to build a bulletproof ETL/ELT/EtLT testing strategy.

-----
Talk 2:
Let's Get Started With Apache Kafka

Speaker:
Lucia Cerchie, Developer Advocate at Confluent

Abstract:
Over a third of respondents to a StackOverflow survey professed to a dread of learning Apache Kafka. Nevertheless, with a curious mindset and the right resources, we have the tools to succeed in learning Kafka. Take the plunge with me; together, we will conquer concepts like events and topics, producers, and consumers.

We'll gain confidence through learning about partitions and brokers and how to use Kafka in the cloud. We'll go over the different configurations for producers and consumers, and how these configurations affect application behavior. Then, accompany me on a code walkthrough and see how we build and run producers and consumers in Python with the Kafka-python client.

You'll be leaving with a spark of excitement, knowing you have neutralized your dread, and that you are now firmly within the two-thirds of StackOverflow respondents who are comfortable learning Apache Kafka.

-----
Listen to the latest episode of Streaming Audio to hear how lakeFS can be used together with Apache Kafka - https://developer.confluent.io/podcast/git-for-data-managing-data-like-code-with-lakefs/.

If you are interested in speaking or hosting our next event, please notify us by completing this short form:https://rb.gy/kx7pqh.

Events in Mountain View, CA

Kafka and lakeFS: Deep Dive into Data Infrastructure

Apache Storm & Apache Kafka

Details

Members are also interested in