Tides of Change: Real-Time Flow with Postgres, Kafka & Flink
Details
Join us on November 13th from 6:00pm for a Data Streaming meetup hosted by Fresha!
📍Venue:
Fresha
The Tower, 207 Old Street
London, EC1V 9NR
7th Floor
PLEASE bring your PHOTO ID and REGISTER with your First and Last Name. Thanks!
đź—“ Agenda:
- 6:00pm – 6:30pm: Food/Drinks and Networking
- 6:30pm - 7:00pm: Celeste Hogan, Developer Advocate, Snowflake
- 7:00pm - 7:30pm: Nicoleta Lazar, Sr. Data Engineer, Fresha
- 7:30pm - 8:00pm: Csanád Bakos, Data Engineer, Vinted
- 8:00pm: Q&A Networking
đź’ˇSpeaker One:
Celeste Hogan, Developer Advocate, Snowflake
Title of Talk:
The lifetime of a write, 3 ways: in Postgres, Kafka and Flink
Abstract:
Kafka and Flink tend to get lumped in as "data services", in the sense that they process data, but in comparison to traditional databases they differ quite dramatically in functionality and utility. In this talk, we'll run through the lifetime of a write in Postgres to establish a baseline, understanding all the different services that data hits on its way down to the disk. Then we'll walk through writing data to a Kafka topic, and what 'writing' (or really, streaming) data to a Flink workflow looks like from a similar systems perspective. Along the way, we'll understand the key differences between the services and why some are more suited to long-term data storage than others.
Bio:
Celeste Horgan is a Sr. OSS Advocate at Snowflake. She got her start in open source at the Linux Foundation, where she supported the Kubernetes project on their documentation. From there, she went on to work at Aiven on open source data platforms, and now continues that work evangelizing data systems for Snowflake. Her work on inclusive language has been featured in the New York Times, and she lives in London.
đź’ˇSpeaker Two:
Nicoleta Lazar, Sr. Data Engineer, Fresha
Title of Talk:
The Real-Time Data Journey: Connecting Flink, Airflow, and StarRocks - Exploring how modern streaming tools power the next generation of analytics
Abstract:
At Fresha, we became the pioneers that put StarRocks to test in production for realtime analytical workloads. But one of the first challenges we faced was getting all the data there reliably and efficiently. We had to think about historical data, and realtime data and orchestrate all of that, such that we can move fast, without breaking too many things. Our tools of choice: Airflow, StarRocks Pipes, Apache Flink. In this talk, I’ll share how we built our data pipelines using Apache Flink and Airflow, what worked and what didn’t for us. Along the way, we’ll explore how Flink helps ensure data consistency, handles failures gracefully, and keeps our real-time workloads running strong.
đź’ˇSpeaker Three:
Csanád Bakos, Data Engineer, Vinted
Title of Talk:
On-the-Fly State Migration: Keeping Your Flink Pipelines Streaming
Abstract:
While upgrading Flink to its latest versions to enable more AI-related capabilities, one can easily run into tricky savepoint incompatibilities that render existing state snapshots unusable for recovery. This is especially problematic in the case of pipelines with large state. In such cases, doing a backfill can take too long and using the State Processor API leads to downtime or breaking the exactly-once delivery guarantee.
In this talk, I’ll share a state migration pattern that I applied to one of our Flink jobs using regular streaming mode. It involves creating a new stateful operator that conforms to the new requirements, allowing for compatible savepoint creation. Leveraging side outputs and custom key traversal the existing state is forwarded to the new operator. In the meantime, regular processing is uninterrupted.
We’ll explore the core problem and understand the pitfalls and trade-offs of existing solutions such as the State Processor API. Then, a deep-dive into the migration pattern will follow: ensuring correct state handoff between operator versions, setting up triggers to migrate all keys and other technicalities. Lastly, a few words about cleaning up seamlessly. With this session I will add a nice pattern to your toolbox that you can easily apply next time you run into state migration challenges.
Bio:
Csanád Bakos is a Data Engineer at Vinted, focusing on streaming feature engineering with Apache Flink. He holds a Master’s degree in Computer Science from the Delft University of Technology, where he graduated cum laude.
***
If you are interested in hosting/speaking at a meetup, please email community@confluent.io
