Data Streaming Meetup
Details
Join us for a Data Streaming meetup on April 30th from 6:00pm in Kraków hosted by VirtusLab!
The talks will be presented in English.
📍Venue:
HEVRE, Beera Meiselsa 18, 31-058 Kraków
🗓 Agenda:
- 6:00pm: Doors open/Welcome
- 6:00pm - 6:10pm: Drinks & Networking
- 6:10pm - 6:40: Olena Kutsenko, Staff Developer Advocate, Confluent
- 6:40pm - 7:10pm: Krzysztof Grajek, Software Engineer, SoftwareMill
- 7:10pm - 7:40pm: Hartmut Armbruster, Freelance Software Architect, Developer. Distributed & Event-Driven Systems
- 7:40pm - 8:10pm: Grzegorz Kocur, Head of DevOps, SoftwareMill
- 8:10pm - 9:00pm: Additional Q&A, Networking
💡 Speaker One:
Olena Kutsenko, Staff Developer Advocate, Confluent
Title of Talk:
Keeping data private in real-time pipelines
Abstract:
We all love real-time data — clicks, payments, rides, messages — but most of it comes with a catch: it contains personal information we're not supposed to leak, such as names, emails, locations, or even small clues that can identify someone. The challenge: how do we keep streaming data useful and safe at the same time?
In this talk, we'll explore practical ways to protect privacy in streaming systems using Apache Kafka, Apache Flink, and Apache Iceberg. We'll cover:
- simple tricks like masking and tokenizing PII;
- why "anonymous" data often isn't anonymous (the re-identification problem);
- techniques like bucketing, k-anonymity, and adding noise;
- how to balance privacy with data utility (too much hiding makes data useless).
Along the way, we'll look at real-world stories: from public data leaks to surprising deanonymization attacks, and show live demos of pipelines that anonymize data before it's written to storage.
If you've ever wondered how to build privacy-aware pipelines, this talk will give you practical patterns you can use right away.
Bio:
Olena is a Staff Developer Advocate at Confluent and a recognized expert in data streaming and analytics. With two decades of experience in software engineering, she has built mission-critical applications, led high-performing teams, and driven large-scale technology adoption at industry leaders like Nokia, HERE Technologies, AWS, and Aiven.
A passionate advocate for real-time data processing and AI-driven applications, Olena empowers developers and organizations to use the power of streaming data. She is an AWS Community Builder, a dedicated mentor, and a volunteer instructor at a nonprofit tech school, helping to shape the next generation of engineers.
As an international speaker and thought leader, Olena regularly presents at top global conferences, sharing deep technical insights and hands-on expertise. Whether through her talks, workshops, or content, she is committed to making complex technologies accessible and inspiring innovation in the developer community.
💡 Speaker Two:
Krzysztof Grajek, Software Engineer, SoftwareMill
Title of Talk:
Building a Kafka Lag Exporter in Rust — Lessons from the Trenches
Abstract:
Consumer lag is one of the most important signals in any Kafka-based system — but measuring it accurately is harder than it looks. Existing exporters struggle with memory bloat, blocking scrapes, and time lag calculations that silently break when producers go idle.
In this talk, I'll share what I learned building klag-exporter, a Kafka consumer group lag exporter written in Rust. We'll cover:
- why offset lag alone isn't enough and how to compute time lag (seconds behind) reliably;
- the difference between reading actual message timestamps and estimating from production rates — and when each approach breaks;
- how librdkafka's internal metadata cache causes unbounded memory growth in long-running processes, and how we solved it;
- batching Kafka Admin API calls to go from O(partitions) to O(brokers) RPCs per collection cycle;
- detecting silent data loss from log compaction and retention before your consumers notice.
Whether you run 50 partitions or 50,000, this talk will give you practical insights into Kafka internals, Rust for infrastructure tooling, and the trade-offs behind building production-grade observability components.
Bio:
💡 Speaker Three:
Hartmut Armbruster, Freelance Software Architect, Developer. Distributed & Event-Driven Systems
Title of Talk:
What If We've Been Scaling Stream Processing Wrong All Along?
Abstract:
Your Kafka Streams application just rebalanced. Again. Your Flink checkpoint is timing out. Again.
Here's an uncomfortable truth: most stream processing applications don't operate at Uber scale. They handle thousands of events per second—complex joins, stateful aggregations, valid use cases—but nowhere near the volumes that justify the operational complexity we've accepted as normal.
Yet we pay the full distributed systems tax anyway. Repartition topics doubling network I/O. Repeated serialization burning CPU cycles.
Standby replicas sitting idle. State migration or restoration during deployments. And the human cost: specialized expertise that takes years to develop, expert teams that are expensive to build and painful to lose.
We've normalized extraordinary inefficiency in the name of horizontal scalability that many applications will never need.
But rethinking stream processing in 2026 doesn't mean "just use Postgres."
In this talk, I'll share an early-stage exploration of a different approach. A framework that preserves the Kafka Streams DSL, borrows Flink's approach to exactly-once semantics, leverages Project Loom for high concurrency—and challenges a fundamental assumption that both frameworks share.
This isn't a production-ready announcement. It's an invitation to question conventional wisdom and explore what stream processing could look like when we stop distributing by default.
Bio:
Hartmut Armbruster is a freelance Software Architect Developer (Distributed & Event-Driven Systems) who is passionate about elevating engineering practices - blending technical depth with a focus on clarity, communication, and impact. He has worked on real-time data processing at HSBC, NEX Group plc, eu-LISA, Raiffeisen Switzerland, and Deutsche Bahn, focusing on mission-critical platforms.
💡 Speaker Four:
Grzegorz Kocur, Head of DevOps, SoftwareMill
Title of Talk:
Kafka High Availability: The 2.5 DC Topology with Observers
Abstract:
Running Apache Kafka as the backbone of a mission-critical system means a single data center failure cannot stop the business. But achieving true high availability — zero data loss, recovery in seconds, and no one woken up at 3 AM — is harder than it looks. Active/Passive deployments with MirrorMaker leave you with lost messages, mismatched offsets, and hours of RTO. Stretched clusters across two data centers bring synchronous replication, but stop accepting writes the moment one DC goes down. Three full data centers solve the problem — at triple the cost. In this talk we'll walk through the trade-offs of each approach and then build up to the solution: the 2.5 DC topology with observer replicas. You'll learn how Confluent Platform's observers, replica placement policies, and automatic observer promotion work together to deliver RPO=0 and near-zero RTO — fully automated, with no manual intervention.
Bio:
Grzegorz is a seasoned DevOps engineer, proudly leading the DevOps team at SoftwareMill. He has worked with Apache Kafka and Confluent Platform for years, helping deploy and manage it in the biggest Polish banks. He also works as a Kafka and Confluent Platform trainer.
***
DISCLAIMER
NOTE: We are unable to cater for any attendees under the age of 18.
If you wish to speak at and/or host a future meetup, please email community@confluent.io
