Databricks Spark evangelist Paco Nathan (http://liber118.com/pxn/) will be passing through town with his appearance at Strata EU (http://strataconf.com/strataeu2014). We asked Paco if he would take an evening to share with us the latest news on Apache Spark. He agreed.
Paco will be discussing the current state of Apache Spark -- including some nifty tricks you might not have seen -- as well as share a bit about where Spark is going. Additional details to follow. This is a talk not to be missed.
About Apache Spark
Apache Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS). However, Spark is not tied to the two-stage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce for certain applications. Spark provides primitives for in-memory cluster computing that allows user programs to load data into a cluster's memory and query it repeatedly, making it well suited to machine learning algorithms.
About Paco Nathan
Paco Nathan (http://www.oreilly.com/pub/au/1927), is a "player/coach" who has led innovative Data teams building large-scale apps for several years. Expertise in distributed systems, machine learning, cloud computing, functional programming. Paco is an O'Reilly (http://www.oreilly.com/pub/au/1927) author -- with a focus on Enterprise data workflows and math literacy among execs, plus a keen interest in Ag+Data (http://radar.oreilly.com/2014/04/agdata.html) --Apache Spark (http://spark.apache.org/) open source evangelist with Databricks (http://databricks.com/), and an advisor for Amplify Partners (http://www.amplifypartners.com/). He received his BS Math Sci and MS Comp Sci degrees from Stanford University, and has 30+ years technology industry experience ranging from Bell Labs to early-stage start-ups.