Introduction to Spark Structured Streaming


Location image of event venue


I’m excited to announce some additions to the Meetup agenda.

First, we’ll be joined by Jonathan Gray, CEO of Cask. Prior to founding Cask, Jon was a software engineer at Facebook, where he was responsible for HBase engineering efforts, including Facebook Messenger and several other large-scale projects. Jon is a core contributor and active committer in the Big Data community. Jon will give a short talk on the evolution of the Big Data ecosystem and the projects he’s worked on as well as give us his perspective on what lies ahead for Hadoop, Spark and Big Data.

Second, Scott Nichols, a Boston-based singer songwriter who plays regularly in clubs and venues in Boston and the New England area, will be performing live prior to the start of the technical content.

Here is the revised agenda for the Meetup.

5:45 – 6:15 PM Rock with Spark - food and socializing with music by Scott Nichols

6:15 – 6:45 PM Big Data talk by Jonathon Gray, founder and CEO of Cask

6:45 – 8:00 PM Intro to Spark Structured Streaming

Look forward to seeing you there.

Spark Structured Streaming
Structured Streaming is a new scalable and fault-tolerant stream processing engine built on the Spark SQL engine. Structured Streaming allows you to express your streaming computation the same way you would express a computation on static data. This has two benefits. The first is code reuse as essentially the same queries be run on batch, interactive or streaming data. Second, it simplifies streaming application development as you can operate on streams of data just like you can on static data using DataFrames. Structured Streaming abstracts away the complexity of streaming analytics allowing you to perform streaming analytics without having to reason about streaming.

The Spark SQL engine takes care of running Structured Streaming queries and incrementally and continuously updating the result as streaming data continues to arrive. With Spark Structured Streaming, you can express streaming aggregations, event-time windows, as well as join streaming data to static data.

In this session, we’ll walk through the basics of Structured Streaming, its programming model and APIs. The concepts will be illustrated using code examples. Then, we’ll walk through a demo of analyzing both static and streaming sensor data to show how the same queries can be used on each, thereby simplifying streaming analytics application development, and how static and streaming data can be leveraged together.