Skip to content

Processing Fast Data with Apache Spark: The Tale of Two APIs

Photo of George Chow
Hosted By
George C.
Processing Fast Data with Apache Spark: The Tale of Two APIs

Details

(NB: This is a joint presentation with the Scala meetup. Please RSVP either here or the Scala meetup but not both. We need to plan for the space and need an accurate count of attendees. Thank you for your kind attention to this.)

Abstract: Processing Fast Data with Apache Spark: The Tale of Two APIs

(In collaboration with Gérard Maas)

Fast Data architectures are the answer to the increasing need for the enterprise to process and analyze continuous streams of data to accelerate decision making and become reactive to the particular characteristics of their market.

Apache Spark is a popular framework for data analytics. Its capabilities include SQL-based analytics, dataflow processing, graph analytics and a rich library of built-in machine learning algorithms. These libraries can be combined to address a wide range of requirements for large-scale data analytics.

To address Fast Data flows, Spark offers two API's: The mature Spark Streaming and its younger sibling, Structured Streaming. In this talk, we are going to introduce both APIs. Using practical examples, you will get a taste of each one and obtain guidance on how to choose the right one for your application.

Bio:
François Garillot is based in Vancouver, where he works on deep learning for the JVM with Skymind. His interests include type systems, leveraging programming languages to make analytic computations simpler to express, and a passion for Scala, Spark and roasted arabica. He received a PhD from École Polytechnique in 2011, and is more recently the co-author of Learning Spark Streaming, to be published really soon now™ by O'Reilly.

Photo of Vancouver Apache Spark Meetup group
Vancouver Apache Spark Meetup
See more events
Simba Technologies
938 West 8th Ave · Vancouver, BC