Skip to content

Spark on Kubernetes, and State Management in Structured Streaming

Photo of Shashank L
Hosted By
Shashank L. and 2 others
Spark on Kubernetes, and State Management in Structured Streaming

Details

In this meetup, we will cover:
a) Kubernetes Native Integration with Spark introduced in Spark 2.3.0
b) Deep Dive into State Management in Structured Streaming

Agenda:
09:00 AM - 10:00 AM - Registration, Welcome Note
10:00 AM - 12:15 PM - Spark on Kubernetes, by Madhukara Phatak, Director of Engineering at Tellius (includes 15-mins break from 11:00 AM - 11:15 AM)
12:15 PM - 12:30 PM - Short Break
12:30 PM - 01:30 PM - Understanding State Management in Structured Streaming, by Chandan Prakash, Data Engineer at Qubole
01:30 PM - 02:30 PM - Lunch and Networking

Abstracts:
a) Kubernetes Native Integration with Spark introduced in Spark 2.3.0

If you are new to Kubernetes, we recommend that you watch the following video from the previous meetup - https://www.youtube.com/watch?v=Q0miRvKA4yk - the same will serve as a pre-read to the upcoming session in which we will cover the case of native integration of Kubernetes with Spark.

b) Deep Dive into State Management in Structured Streaming

Stateful processing in stream processing needs to manage the state of intermediate data for operations like aggregation, groupby, de-duplication. Structured streaming, which is a new SQL based stream processing in Spark, has taken a different and more efficient approach to manage state compared to older DStream based Spark streaming. In this talk we will discuss in detail about:
Architecture of the new state management in structured streaming
Comparison with older stream based Spark streaming in managing state
Deep dive into streaming code to understand how state management works in structured streaming with demo example

Photo of AI Performance Engineering Meetup (Bangalore) group
AI Performance Engineering Meetup (Bangalore)
See more events
Qubole India
256/A, 17th Cross, 5th Main, HSR Layout, 6th Sector · Bangalore