Skip to content

Webinar: Distributed Stream Processing in Practice [Scalable, Real-time Data Pip

Photo of Sandeep Devarapalli
Hosted By
Sandeep D.
Webinar: Distributed Stream Processing in Practice [Scalable, Real-time Data Pip

Details

About the Event
​This technical session examines real-world challenges and patterns in building distributed stream processing systems. We focus on scalability, fault tolerance, and latency trade-offs through a concrete case study, using specific frameworks like Apache Storm as supporting tools to illustrate production concepts.

Why Should You Attend
​Learn practical patterns for distributed stream processing at scale:

  • ​Master real-world challenges - Understand scalability, fault tolerance, and latency trade-offs in production
  • ​See architectural patterns - Stateless vs. stateful processing, event time vs. processing time decisions
  • ​Handle scale bottlenecks - Partitioning strategies, backpressure handling, and scheduling challenges
  • ​Learn from concrete examples - Real ML feature generation pipeline using Storm and Kafka

Perfect for: Data engineers building distributed streaming systems who need production-proven patterns.
​------------------------------------------------------------
Agenda (30 minutes)

1. Stream Processing: Past and Now (4 minutes)

  • ​Rise of real-time data needs in ML, analytics, and user-facing apps
  • ​Shift from batch-first to event-first architectures

2. Distributed Stream Processing Fundamentals (5 minutes)

  • ​Definition and fundamentals
  • ​Processing types: at-most-once, at-least-once, exactly-once
  • ​Batch vs. micro-batch vs. true streaming

3. Architectural Patterns (6 minutes)

  • ​Stateless vs. stateful processing
  • ​Event time vs. processing time
  • ​Schedulers

Common architecture: Kafka → Stream Processor → Sink (DB, Lake, Dashboard)

4. Designing for Scale (6 minutes)

  • ​Partitioning strategies and operator parallelism
  • ​Handling backpressure and traffic spikes
  • ​Scheduling challenges and system bottlenecks
  • ​Fault tolerance and availability

5. Case Study: Real-Time ML Feature Generation (10 minutes)

  • ​Event Source (Kafka): Collects user events
  • ​Stream Engine (Apache Storm): Processes and transforms streams
  • ​Storage (S3): Stores aggregated feature datasets
  • ​Setup: 1 Nimbus + 3 Workers distributed topology
  • ​Model Training: Python jobs consume features
Photo of Bangalore Apache Iceberg™ Meetups group
Bangalore Apache Iceberg™ Meetups
See more events