Skip to content

How Apache Spark fits in the Big Data landscape

Photo of Mikael Huss
Hosted By
Mikael H.
How Apache Spark fits in the Big Data landscape

Details

Paco Nathan from DataBricks, the company that just beat the Hadoop terabyte sort benchmark record (https://gigaom.com/2014/10/10/databricks-demolishes-big-data-benchmark-to-prove-spark-is-fast-on-disk-too/), is visiting Stockholm in about a month. He is a "player/coach" who's led innovative data teams building large-scale apps for several years. Expertise in distributed systems, machine learning, cloud computing, functional programming. Paco is an O'Reilly author -- with a focus on Enterprise data workflows and math literacy among execs, plus a keen interest in Ag+Data -- Apache Spark open source evangelist with Databricks, and an advisor for Amplify Partners. He received his BS Math Sci and MS Comp Sci degrees from Stanford University, and has 30+ years technology industry experience ranging from Bell Labs to early-stage start-ups.

Apache Spark is intended as a general purpose engine that supports combinations of Batch, Streaming, SQL, ML, Graph, etc., for apps written in Scala, Java, Python, Clojure, R, etc. This talk provides an introduction to Spark — how it provides so much better performance, and why — and then explores how Spark fits into the Big Data landscape — e.g., other systems with which Spark pairs nicely — and why Spark is needed for the work ahead.

We'll review some of the new features in the next release, have a demo of notebooks in Databricks Cloud, and also discuss about the new Spark Developer Certificate program.

Photo of Stockholm Big Data group
Stockholm Big Data
See more events
Ericsson (meeting room Lars Magnus)
Torshamnsgatan 21 · Stockholm