Dubbed the leading successor to Hadoop MapReduce, Apache Spark is a cluster compute system that makes data analytics fast -- both fast to run and fast to write. Programs written in Spark can often outperform those in MapReduce by 100X, while being 10X shorter and more understandable. In addition, Spark also provides efficient support for streaming, query execution, machine learning, and graph computation through rich high level libraries. Last but not least, the project features one of the most active open source community in Big Data: 150+ developers from 30+ organizations have contributed code to the project. In this talk, we will introduce the project, survey the high level libraries including streaming, SQL, and machine learning, and expand into how Spark can help you make better decisions easier and faster.
Reynold Xin is a committer on Apache Spark and a co-founder of Databricks. He is instrumental in the development of many high level frameworks on Spark, including SQL and graph computation. Prior to Databricks, he was pursuing a PhD in the UC Berkeley AMPLab.
Patrick Wendell is a committer on Apache Spark and a co-founder of Databricks. Before Databricks, he was pursuing a PhD in the UC Berkeley AMPLab, where he worked on scalable low latency scheduling for data processing frameworks. In the past, he has contributed to several Hadoop projects, including Apache Flume and Apache Avro.
Xiangrui Meng leads the development of the machine learning library on Spark at Databricks. Prior to Databricks, he was the primary developer on a Hadoop MapReduce-based machine learning framework at LinkedIn. He holds a doctorate degree in Computational and Mathematical Engineering from Stanford, where he was conducting research on large scale machine learning.
[masked]pm: Registration and Networking (with food & beverages)
7:[masked]pm: Q&A session
We'll be raffling two free passes for HBaseCon 2014 (May 5, San Francisco), so bring your business cards. Look for the Cloudera table!