Skip to content

OpenLineage x Apache Kafka® Meetup

M
Hosted By
Michael R.
OpenLineage x Apache Kafka® Meetup

Details

Join us at Confluent's offices in London on January 31st for an OpenLineage x Kafka meetup. We'll discuss the current state of lineage in general and support for streams in particular. More details will be added soon!

Tentative agenda:

  • 6:00pm: Doors open
  • 6:00pm - 6:30pm: Pizza, Drinks and Networking, Introductions
  • 6:30pm - 7:15pm: OpenLineage in Action: A Catalyst for Data Observability by Abdallah Terrab, Staff Data Engineer, Decathlon Digital
  • 7:15pm - 8:00pm: Under the Covers: Segments of Apache Kafka by Kirill Kulikov, Sr. Engineering Consultant, Confluent
  • 8:00pm - 8:45pm: Why should Kafka users be aware of OpenLineage? by Paweł Leszczyński, OpenLineage Committer and Data Engineer, GetInData/Astronomer
  • 8:45pm - 9:00pm: Additional Q&A & Networking

Abstracts:

  • OpenLineage in Action: A Catalyst for Data Observability by Abdallah Terrab, Staff Data Engineer, Decathlon Digital
  • "OpenLineage in Action: A Catalyst for Data Observability" is all about understanding and keeping a close eye on your data, especially in big companies where it's a big deal. First up, we're talking about what Data Observability is and why it's a game-changer in handling massive amounts of data.
  • Under the Covers: Segments of Apache Kafka by Kirill Kulikov, Sr. Engineering Consultant, Confluent
  • You might be acquainted with the fundamental components of Apache Kafka®, such as topics and partitions. However, what lies beneath the surface? The commit-log stands out as the pivotal foundation of Kafka. Functioning as a logical sequence of records, it comprises segments (files) responsible for record storage and facilitating data replication among nodes. Gaining insight into the inner workings of Apache Kafka® proves advantageous for developers, enabling them to reason more effectively about topics, partitions, log retention, and more. This understanding also proves valuable for operators as they determine cluster size and comprehend its capabilities.
    In this presentation, we are going to deep dive into the internals of Kafka log mechanisms. We will look in detail at the structure of the commit-log and segments, topic partitions arrangement on disk, log retention for compact and delete policies. An attendee will take home knowledge of the commit-log structure and code examples of how to analyse and debug the commit-log.
  • Why should Kafka users be aware of OpenLineage? by Paweł Leszczyński, OpenLineage Committer and Data Engineer, Astronomer
  • Message brokers serve oftentimes as input to the whole analytical world. It is a common scenario to dump topic content into tables. This can cause a lineage gap, as data producers see only topic's dumping process as a consumer, instead of dozens of data processing jobs reading actual topic data from a table. This is where cross-platform lineage becomes important.

Speaker bios:
Abdallah Terrab, Decathlon

```
Originally from Morocco and now based in Paris, I am a Data Architect Consultant at Decathlon Technology, where I contribute to the OpenLineage project for enhancing data observability. My diverse background includes roles as a Data Engineer and Machine Learning Engineer at TotalEnergies, and founding TID Consultancy, a firm dedicated to maximizing data potential for businesses of all sizes. I also co-founded Realift, spearheading machine learning innovations in online apparel fitting. Complementing my professional endeavors is a Master's degree in Applied Mathematics, and a passion for educating the next generation in Machine Learning and Big Data.
```

Kirill Kulikov, Confluent

  • Kirill is a Senior Consulting Engineer at Professional Services team at Confluent. He has been working with data and distributed systems for more than 15 years. His career began in programming across a range of domains. Over the last decade he worked for various companies as a senior software engineer and consultant. Currently, at Confluent, Kirill is helping companies build scalable and reliable data architectures with Apache Kafka and Confluent Platform.
  • Paweł Leszczyński, Astronomer
  • Data practitioner with decade long experience, holding PhD in distributed databases. Currently fully involved in Openlineage - an open platform for collection and analysis of data lineage with special focus on automatic lineage extraction.
Photo of London OpenLineage Meetup Group group
London OpenLineage Meetup Group
See more events