Stream Processing with Apache Kafka, Samza, and Flink

Mountain View, CA, US

3,598 members · Public group

Organized by Allison N Newman and 10 others

What we’re about

Stream processing/real time event processing is everywhere. This group's goal is to showcase some of the cutting edge developments that are happening in stream processing in the Industry. The focus of the meetup will be Apache Kafka, Apache Samza, Apache Flink, Change Data Capture, Lambda/Kappa Architecture and such. Hosted by Linkedin.

Past meetup talks are available at https://www.youtube.com/playlist?list=PLZDyxA22zzGx34wdHESUux2_V1qfkQ8zx

Upcoming events (1)

See all

Thu, Jul 17, 2025, 12:30 AM UTC[In-Person + Online] Stream Processing with Apache Kafka, Samza, and Flink
Link visible for attendees
Venue: Together (Meeting Room) -- 700 E Middlefield Rd, Mountain View, Building 4, 1st Floor

Zoom: https://linkedin.zoom.us/j/99268619479

5:30 - 6:00: Networking [in-person only + catered food]
6:00 - 6:05: Welcome
6:05 - 6:40: Scaling Kafka for Netflix's Record-Breaking Live Events
Harshit Mittal, Netflix
In 2024, Netflix shattered streaming records with the Paul vs. Tyson boxing match and NFL Christmas day games. These unprecedented live events presented significant challenges for our Kafka infrastructure, yielding critical operational insights. This presentation will detail how we rapidly enhanced the resilience of Netflix's Kafka fleet to support massive-scale live streaming. We'll focus on our Cruise Control, an open-source tool useful for Kafka operation, particularly a new Topic Aware Rebalance goal. That addresses "false" data skews and "outlier" broker behaviors observed during peak streaming periods. We'll also share the technical considerations involved in deploying these solutions across our entire Kafka ecosystem to support Netflix's growing portfolio of high-demand live events.

Harshit is a Sr Software Engineer on the Data Movement Engines team at Netflix. He is the lead for Schema Platform for Data Platform and works on Apache Kafka at Netflix, building and operating the fleet of Kafka clusters and associated libraries. He also worked on realtime data processing platforms — Apache Flink and Mantis before that. Previously, he worked at Uber where he spent his time building an exception logging platform for mobile and backend services.
Outside of work, he is a proud father of a 4 year old who he loves biking with. He also loves to travel and hike around the SF Bay Area; though the latter is difficult these past few years.

6:40 - 7:15: Kafka-less, Cloud-Native Streaming Processing with Apache Beam and Iceberg
Talat Uyarer, Google
Evolve your stream processing beyond the traditional event bus by using Apache Beam and Apache Iceberg. This talk introduces a novel approach: streaming directly from Apache Iceberg using Apache Beam, enabling efficient stream processing without Apache Kafka for many use cases.
This presentation will detail the high-level design of the Beam Iceberg streaming source, which leverages Iceberg's incremental scan API to consume new data as it's committed. We'll highlight key benefits, such as simplified operations via managed cloud storage, unified storage for both streaming and batch workloads, elimination of certain cross-AZ network costs, advanced data pruning, and dynamic, pull-based split assignment for improved resource utilization and autoscaling. This approach directly addresses critical pain points of traditional streaming architectures.
We'll highlight how Apache Beam's robust event-time processing and watermark capabilities ensure reliable stream processing directly from Iceberg. Evaluation results will be shared, confirming the source's strong performance and efficiency. Join us to learn how this innovation enables next-generation, cloud-native data pipelines that are both highly cost-effective and operationally streamlined.

Talat is a Senior Staff Software Engineer at Google.

7:15 - 7:50: How Metronome Scaled Their Real-time Usage Billing Pipeline to Billions of Events per Day
Casey Crites and Nick Dellamaggiore, Metronome
Metronome powers real-time usage-based revenue for some of the world's fastest-growing companies like OpenAI, Anthropic, Databricks, and Confluent. Real-time billing pipelines operate at a similar volume and scale as observability pipelines--but with much higher stakes. Each event needs exactly once processing and can never drop data. What’s more, a lag in billing can be catastrophic — spend alerts and invoices depend on speed and accuracy.
Learn how Metronome used Kafka, Kafka Streams, and Responsive to scale their billing pipeline to billions of events per day.

Casey is a founding engineer at Metronome with over eighteen years of experience in the engineering field. Casey has previously worked as a lead engineer and senior software engineering manager at New Relic, Inc., lead engineer and director of engineering at Shyp, and has been an engineering advisor to multiple startups.

Nick is the Infrastructure Tech Lead at Metronome, where he focuses on reliability, scalability and cloud cost optimization. Prior to Metronome, he helped scale Kafka and core infrastructure at Robinhood, and previously worked on backend platforms at Coursera and LinkedIn.
47 attendees+42

Past events (31)

See all

Thu, Apr 17, 2025, 12:30 AM UTC[In-Person + Online] Stream Processing with Apache Kafka, Samza, and Flink
This event has passed
147 attendees+142

Stream Processing with Apache Kafka, Samza, and Flink

What we’re about

Upcoming events (1)

Past events (31)

Group links

Related topics