Stream Processing with PyFlink
Details
Building Resilient, Real-Time Data Pipelines with Python and Apache Flink - Alexey Grigorev
In this hands-on session, you’ll learn how to bridge the gap between batch processing and real-time engineering. While many data engineers are comfortable with SQL and Python, streaming introduces complex challenges like "lateness," out-of-order events, and state management.
We’ll show you how PyFlink handles these effortlessly.
This session walks you through a complete end-to-end flow: producing mock event data into Red Panda, performing real-time windowed aggregations in Flink, and sinking the results into a PostgreSQL database for immediate analysis.
What You'll Learn
- The Streaming Mindset: When to use continuous processing vs. micro-batches.
- Architecture 2026: Setting up a modernized stack with Red Panda, Flink, and Postgres.
- Watermarks & Windows: How to handle data from users "in a tunnel" using 2026 best practices.
- Resiliency & Recovery: Configuring Flink Checkpointing to ensure you never lose your place during a failure.
- The Table API: Using Python to write "SQL-like" transformations on live data streams.
It will be a live demo with a fully working 2026 environment, practical troubleshooting tips for common Flink "stumbling blocks," and a chance to ask your questions. This workshop gives you a real feel for how stream processing is implemented in high-scale, real-world environments.
About the Speaker
Alexey Grigorev is the Founder of DataTalks.Club and creator of the Zoomcamp series.
Alexey is a seasoned software and ML engineer with over 10 years of engineering experience and 6+ years in machine learning. He has deployed large-scale ML systems at companies like OLX Group and Simplaex, authored several technical books, including Machine Learning Bookcamp, and is a Kaggle Master with a 1st place finish in the NIPS'17 Criteo Challenge.
**Join our Slack: https://datatalks.club/slack.html**
