One of the first steps in adopting stream processing is understanding that little if any data should be kept around during processing. Yet having completely stateless transformations is often difficult. We'll take a couple of examples of stream processing tasks where state might make sense — a simple aggregative ETL job, and an anomaly detection task — and drive them through the features Spark Streaming offers to address the issue of transforming DStreams with memory.
Audiences should come back from this talk with a better view when and where it's appropriate to collect some state in stream processing, and in the facilities available in Spark Streaming — now and in the future — to do so.
François is a Big Data Scientist at Swisscom and was previously part of the Typesafe (now Lightbend) crew.