How Apache Spark fits in the Big Data landscape


Details
Paco Nathan from DataBricks, the company that just beat the Hadoop terabyte sort benchmark record (https://gigaom.com/2014/10/10/databricks-demolishes-big-data-benchmark-to-prove-spark-is-fast-on-disk-too/), is visiting Stockholm in about a month. He is a "player/coach" who's led innovative data teams building large-scale apps for several years. Expertise in distributed systems, machine learning, cloud computing, functional programming. Paco is an O'Reilly author -- with a focus on Enterprise data workflows and math literacy among execs, plus a keen interest in Ag+Data -- Apache Spark open source evangelist with Databricks, and an advisor for Amplify Partners. He received his BS Math Sci and MS Comp Sci degrees from Stanford University, and has 30+ years technology industry experience ranging from Bell Labs to early-stage start-ups.
Apache Spark is intended as a general purpose engine that supports combinations of Batch, Streaming, SQL, ML, Graph, etc., for apps written in Scala, Java, Python, Clojure, R, etc. This talk provides an introduction to Spark — how it provides so much better performance, and why — and then explores how Spark fits into the Big Data landscape — e.g., other systems with which Spark pairs nicely — and why Spark is needed for the work ahead.
We'll review some of the new features in the next release, have a demo of notebooks in Databricks Cloud, and also discuss about the new Spark Developer Certificate program.

How Apache Spark fits in the Big Data landscape