Skip to content

India Open Source Data Infrastructure Meetup - May 2024

Photo of Floor Drees
Hosted By
Floor D.
India Open Source Data Infrastructure Meetup - May 2024

Details

  • Are you interested in learning more about open-source data technologies? ✅
  • Do you want to network with local and international tech professionals in a fun, relaxed environment? ✅

Then join us at the Microsoft Reactor on May 4, for a meetup co-hosted by Aiven and Decodable, right after Kafka Summit.

Agenda:

  • 11:00 - 11:30 Welcome: Networking & refreshments
  • 12:30 - 11:40 Kickoff: Welcome from Aiven & Decodable
  • 11:40 - 12:00 Beginners guide to balance your data across Apache Kafka partitions - Olena Kutsenko, Senior Developer Advocate at Aiven
  • 12:00 - 12:20 From Postgres to OpenSearch in No Time - Gunnar Morling, Decodable
  • 12:20 - 12:40 Stream Processing using Pyflink Table API and FlinkSQL - Diptiman Raichaudhuri, Developer Advocate, Confluent
  • 12:40 - 14:00 Lunch & networking

Beginners guide to balance your data across Apache Kafka partitions
Apache Kafka is a distributed system. At the heart of Apache Kafka is a set of brokers that contain topics. Topics are split into partitions. Dividing topics into smaller pieces allows us to work with data in parallel and achieve higher data throughput.

Such parallelization is the key to a performant cluster, however it comes with a price. First, reading from multiple partitions will eventually mess up the order of records, meaning that the resulting order will be different from when the data was pushed into the cluster. Another big challenge is uneven distribution of data across partitions.

Overloaded partitions present a dangerous issue for performance of all involved parties, but especially for brokers and consumers. Therefore, when building our product architecture we should carefully weigh up how many partitions we need, how to ensure proper message ordering, how to balance records across partitions, not forgetting about data load distribution over time. And do all of this while still maintaining good performance of the cluster.

If you're fresh to Apache Kafka, or looking for good practices to design your partitions and avoid common pitfalls, you'll find this session useful!

Olena is a seasoned expert in data, sustainable software development, and teamwork. With a background in software engineering, she's led teams and developed mission-critical applications at Nokia, HERE Technologies, and AWS. Currently, she works at Aiven where she supports developers and customers in using open-source data technologies such as Apache Kafka, ClickHouse, and OpenSearch. She is also an international public speaker and regularly presents at conferences around the world. She holds AWS Developer and Solutions Architect certifications, and is also a Confluent Catalyst.

From Postgres to OpenSearch in No Time
You've been tasked with implementing a data streaming pipeline for propagating data changes from your operational Postgres database to a search index in OpenSearch. Data views in OS should be denormalized for fast querying, and of course there should be no noticeable impact on the production database.

In this session we'll discuss how to build this data pipeline using two popular open-source projects: Debezium for log-based change data capture (CDC) and Apache Flink for stream processing. Join us for this talk and learn about:
* Setting up change data streams with Debezium
* Efficiently building nested data structures from 1:n joins
* Deployment options: Kafka Connect vs. Flink CDC

Gunnar Morling is a Software Engineer and open-source enthusiast by heart, currently working at Decodable on stream processing based on Apache Flink. In his prior role as a software engineer at Red Hat, he led the Debezium project, a distributed platform for change data capture. He is a Java Champion and has founded multiple open source projects such as JfrUnit, kcctl, and MapStruct. Gunnar is an avid blogger (morling.dev) and has spoken at various conferences like QCon, Java One, and Devoxx. He lives in Hamburg, Germany.

Photo of Bengaluru Open Source Data Infrastructure Meetup group
Bengaluru Open Source Data Infrastructure Meetup
See more events
FREE