Skip to content

Interactive Real-time Streaming with Spark 2.0 - Structured Streaming

Photo of Prasad Sripathi
Hosted By
Prasad S.
Interactive Real-time Streaming with Spark 2.0 - Structured Streaming

Details

Abstract:

Apache Spark 2.0 introduces a new API for performing data analysis on real-time data sources called Structured Streaming. Based on the DataFrame and DataSet APIs in Spark, Structured Streaming enables developers to write streaming applications that can take advantage of the Catalyst's and Tungsten's powerful optimization engines in Spark while still providing the resilience of RDDs.

In this talk, I'll discuss Structured Streaming, why it makes real-time data analysis faster and easier, how it works under the hood, and how to take advantage of these new tools in Python applications. Through a demo in Databricks Community Edition, using high-level DataFrames and Dataset APIs, I will demonstrate how simple it is to write a Structured Streaming application and interactively analyze real-time data sets.

Speaker Bio:

Miles Yucht is a software engineer at Databricks, where he has been working on a team developing a highly multi-tenant, scalable version of Databricks, allowing users from many different organizations to use Databricks simultaneously on a single server, which was released as Databricks Community Edition at this year's Spark Summit East 2016.

Miles Yucht graduated from Princeton with an A.B. in Computer Science.

Photo of Large Language Models group
Large Language Models
See more events
Princeton University - Lewis Library Rm 138
Washington Road and Ivy Lane, · Princeton, NJ 08544, NJ