One of the first steps in adopting stream processing is understanding that little if any data should be kept around during processing. Yet having completely stateless transformations is often difficult. We'll take a couple of examples of stream processing tasks where state might make sense — a simple aggregative ETL job, and an anomaly detection task — and drive them through the features Spark Streaming offers to address the issue of transforming DStreams with memory.
Audiences should come back from this talk with a better view when and where it's appropriate to collect some state in stream processing, and in the facilities available in Spark Streaming — now and in the future — to do so.
François is a Big Data Scientist at Swisscom and was previously part of the Typesafe (now Lightbend) crew.
NB: our friends from the Scala Romandie meeting is hosting a Spark meetup on April 19th in Geneva (http://www.meetup.com/Scala-Romandie/events/229599508/), and both talk should delightfully come together.
Many thanks to Lightbend (www.lightbend.com) and OCTO Technology (www.octo.ch) to host this meetup.