Preview of Spark Streaming


Details
This meetup will feature the first preview of Spark Streaming, the extension to the Spark cluster computing framework that supports near-real-time stream processing. Spark Streaming is under active development at Berkeley with help from Conviva, and will likely be released as an alpha later this summer. When finished, it will let users combine streaming, batch and interactive queries behind the same rich API and fast, in-memory computing engine.
In addition, there will be an overview of improvements to the Spark engine currently in the "dev" branch, and future development plans. We also plan to solicit feedback from users on which features they want us to prioritize.
The meetup will be hosted at Yelp in San Francisco. Food will be provided. Doors open at 6:15, with talks starting at 7 PM.
Important: Please register by Monday June 18th, with both your first and last names. The organizers need to have a list of attendees in advance. (If you'd prefer not to list your real name online, you can email matei@eecs.berkeley.edu).
More about Spark Streaming
Spark streaming lets users run fault-tolerant continuous queries with 1-2 second latency on large data streams, using a rich functional interface similar to Spark, where users can map, filter, join, and reduce streams (among other operations) using functions in the Scala programming language. The system automatically distributes the work across machines and recovers from failures and stragglers, even for operators with state, such as a reduce over a sliding window. In addition, users can combine streams with historical data computed through batch jobs, or run ad-hoc queries on stream state from the Scala interpreter, providing a powerful realtime analytics environment. While Spark Streaming is still in development, early results show that it performs similarly, and often significantly better, than current open source stream processing frameworks, while offering a richer programming model and stronger fault tolerance guarantees. A short paper on the system is available at http://www.cs.berkeley.edu/~matei/papers/2012/hotcloud_spark_streaming.pdf .
The project will be presented by Tathagata Das, Haoyuan Li and Matei Zaharia, the team behind the research effort.

Preview of Spark Streaming