
What we’re about
Stream processing/real time event processing is everywhere. This group's goal is to showcase some of the cutting edge developments that are happening in stream processing in the Industry. The focus of the meetup will be Apache Kafka, Apache Samza, Apache Flink, Change Data Capture, Lambda/Kappa Architecture and such. Hosted by Linkedin.
Past meetup talks are available at https://www.youtube.com/playlist?list=PLZDyxA22zzGx34wdHESUux2_V1qfkQ8zx
Upcoming events
1
•Online[In-Person + Online] Stream Processing with Apache Kafka, Samza, and Flink
Online- Venue: Diversify (Meeting Room) -- 800 E Middlefield Rd, Mountain View, CA 94043
 - Zoom: https://linkedin.zoom.us/j/97861912735
 
5:30 - 6:00: Networking [in-person only + catered food]
6:00 - 6:05: Welcome
6:05 - 6:40: Charting New Territory: LinkedIn’s Early Bet on Flink Batch for Large-Scale Workloads
Venkata krishnan Sowrirajan and Archit Goyal, LinkedIn
As one of the earliest adopters of Flink Batch, LinkedIn has taken a bold step toward redefining large-scale batch processing. This talk shares how we built a production-grade Flink Batch platform from the ground up—covering architectural decisions, platform engineering challenges, and lessons learned while scaling it across mission-critical workflows. If you're considering Flink beyond streaming, this is your inside look at what it takes to run Flink Batch reliably at scale.Key Topics include:
- 
SQL Query Optimizations: We’ll share how implementing enhancements such as nested projection and filter pushdown yielded significant reductions in compute and I/O, accelerating batch processing and lowering hardware costs.
 - 
Remote Shuffle with Celeborn: Learn how integrating Flink Batch with Celeborn’s disaggregated shuffle service helped us overcome scaling bottlenecks, enabling consistent throughput and predictable performance for our largest batch workloads.
 - 
Scalable Workflow Orchestration: We’ll discuss leveraging Apache Airflow to automate, schedule, and monitor Flink Batch pipelines alongside existing data workflows, minimizing operational overhead.
 - 
Operational Observability: Explore how we utilize the Flink HistoryServer for comprehensive post-job diagnostics and rapid troubleshooting of complex batch workloads.
 - 
Case Study: Get a detailed look at how we optimized Flink Batch for one of LinkedIn’s largest machine learning model training data pipelines—including before-and-after metrics and actionable tuning techniques.
 
We will cover key technical decisions made, challenges encountered, solutions adopted, and areas identified for future improvement. The talk aims to share concrete approaches and lessons learned that other teams can adapt to their own batch processing environments with Flink.
- Venkata is a Staff Software Engineer at LinkedIn specializing in building scalable batch processing infrastructures. Currently, he leads the development of LinkedIn’s Flink Batch platform, enhancing its large-scale data capabilities. Previously, Venkat contributed significantly to Apache Spark , tackling shuffle infrastructure challenges to meet LinkedIn’s massive data scale. With experience at startups like Qubole and MapR, Venkat has a deep passion for distributed systems and has contributed to open-source projects including Flink, Spark, and Iceberg.
 - Archit is a Sr. Software Engineer at LinkedIn and one of the founding members of LinkedIn’s Flink Batch team. He brings experience in platformization and productionization, with a strong focus on making Flink Batch a reliable and scalable solution for large-scale data processing.
 
6:40 - 7:15: Step by Step Breakdown and Best Practices: Moving Kafka From On Prem to Cloud
Drew Oetzel, Lenses.io
Migrating Kafka clusters to the cloud sounds simple in theory—until you're knee-deep in offset management, schema registry synchronization, and zero-downtime cutover planning. One wrong move and you're facing data loss, duplicate messages, or angry engineers wondering why their consumers just broke. In this talk, we'll demystify the migration process and walk you through the battle-tested steps for getting your data from on-prem to cloud—exactly once, with schemas intact, offsets preserved, and your sanity mostly intact. We'll highlight the gotchas that catch even experienced teams (spoiler: offset translation is trickier than you think), explore the tools that can save you weeks of pain, and share real-world strategies for executing a seamless cutover.
Whether you're planning your first migration or recovering from your last attempt, you'll leave with a practical playbook for tackling this Herculean task without the drama.- Drew is a Developer Advocate at Lenses.io with over 25 years of experience in distributed systems, data platforms, and technical education. His background spans notable companies including Splunk, Heptio (now VMware), and Mesosphere, where he has specialized in helping organizations optimize their data infrastructure and cloud-native architectures. At Lenses.io, Drew focuses on enabling organizations to implement effective data streaming strategies across complex, multi-cloud environments. Ask him about his addiction to Civilization 7.
 
7:15 - 7:50: 10X your migration velocity with AI
Abhishek Mendhekar and Jun Guan, LinkedIn
What if AI could turn a months-long migration into a week-long sprint? In this session, dive into the behind-the-scenes story of how we used AI to supercharge a massive Samza to Flink migration at LinkedIn cutting down effort by 10x and unlocking a new way to think about engineering at scale.
Forget the hype, this talk is about real, applied AI. You’ll see how large language models helped generate production-ready code, automate complex validations, and collaborate with engineers in a high-stakes migration scenario. Along the way, we’ll tackle the tough questions: Can I trust AI? Where does human oversight matter? How do I scale this across teams?
Whether you’re leading migrations or just starting to explore AI’s potential, you’ll walk away with frameworks, aha moments, and a radically new view of what’s possible when humans and machines build software together.- Abhishek is an Engineering Manager at LinkedIn, where he leads the Stream Processing team. His team owns the Apache Flink engine at LinkedIn and all of the company’s stream processing authoring APIs, including Apache Beam and Flink SQL. They are responsible for driving the large-scale migration from Apache Samza to Flink on LinkedIn’s next-generation data infrastructure. Abhishek is passionate about building scalable and resilient real-time data platforms, unlocking customer use cases with stream processing, advancing SQL-first approaches, and exploring AI-powered tooling to accelerate developer productivity.
 - Jun is a Staff Engineer at LinkedIn, where she specializes in stream processing. She is currently leading a major project to migrate their systems from Samza to Flink. With her extensive experience in large-scale migrations, Jun has a deep understanding of the challenges and pain points involved in transitioning streaming processing jobs. She is passionate about leveraging AI to solve complex problems and increase productivity.
 
41 attendees
Past events
32