addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwchatcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-crosscrosseditemptyheartfacebookfolderfullheartglobegmailgoogleimagesinstagramlinklocation-pinmagnifying-glassmailminusmoremuplabelShape 3 + Rectangle 1outlookpersonplusprice-ribbonImported LayersImported LayersImported Layersshieldstartrashtriangle-downtriangle-uptwitteruseryahoo

Preview of Spark Streaming

  • Jun 20, 2012 · 6:15 PM

This meetup will feature the first preview of Spark Streaming, the extension to the Spark cluster computing framework that supports near-real-time stream processing. Spark Streaming is under active development at Berkeley with help from Conviva, and will likely be released as an alpha later this summer. When finished, it will let users combine streaming, batch and interactive queries behind the same rich API and fast, in-memory computing engine.

In addition, there will be an overview of improvements to the Spark engine currently in the "dev" branch, and future development plans. We also plan to solicit feedback from users on which features they want us to prioritize.

The meetup will be hosted at Yelp in San Francisco. Food will be provided. Doors open at 6:15, with talks starting at 7 PM.

Important: Please register by Monday June 18th, with both your first and last names. The organizers need to have a list of attendees in advance. (If you'd prefer not to list your real name online, you can email [masked]).


More about Spark Streaming

Spark streaming lets users run fault-tolerant continuous queries with 1-2 second latency on large data streams, using a rich functional interface similar to Spark, where users can map, filter, join, and reduce streams (among other operations) using functions in the Scala programming language. The system automatically distributes the work across machines and recovers from failures and stragglers, even for operators with state, such as a reduce over a sliding window. In addition, users can combine streams with historical data computed through batch jobs, or run ad-hoc queries on stream state from the Scala interpreter, providing a powerful realtime analytics environment. While Spark Streaming is still in development, early results show that it performs similarly, and often significantly better, than current open source stream processing frameworks, while offering a richer programming model and stronger fault tolerance guarantees. A short paper on the system is available at


The project will be presented by Tathagata Das, Haoyuan Li and Matei Zaharia, the team behind the research effort.

Join or login to comment.

  • Joakim S.

    Do you plan TCP support for Spark streaming?

    September 10, 2012

  • A former member
    A former member

    Spark Streaming was even more impressive than the demo of Spark and Shark at the Hadoop Summit. Loved it.

    June 21, 2012

  • Ryan H.

    Nice presentation and interesting people to chat with

    June 21, 2012

  • Matei Z.

    Thanks everyone for coming by! I've uploaded today's slides at

    June 20, 2012

  • A former member
    A former member

    No, I believe this one is focused on the streaming portion which is work in progress at this time. Summit talk covered Spark and Shark mostly and just briefly mentioned streaming...

    June 16, 2012

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy