Apache Spark - Making Sense of Big Data Faster and Easier

Name: Apache Spark - Making Sense of Big Data Faster and Easier
Start: 2014-04-09T18:00:00-07:00
End: 2014-04-09T20:30:00-07:00
Location: Perkins Coie

Hosted by Pashu P.

The Hive Think Tank

Details

Dubbed the leading successor to Hadoop MapReduce, Apache Spark is a cluster compute system that makes data analytics fast -- both fast to run and fast to write. Programs written in Spark can often outperform those in MapReduce by 100X, while being 10X shorter and more understandable. In addition, Spark also provides efficient support for streaming, query execution, machine learning, and graph computation through rich high level libraries. Last but not least, the project features one of the most active open source community in Big Data: 150+ developers from 30+ organizations have contributed code to the project. In this talk, we will introduce the project, survey the high level libraries including streaming, SQL, and machine learning, and expand into how Spark can help you make better decisions easier and faster.

Speakers

Reynold Xin is a committer on Apache Spark and a co-founder of Databricks. He is instrumental in the development of many high level frameworks on Spark, including SQL and graph computation. Prior to Databricks, he was pursuing a PhD in the UC Berkeley AMPLab.

Patrick Wendell is a committer on Apache Spark and a co-founder of Databricks. Before Databricks, he was pursuing a PhD in the UC Berkeley AMPLab, where he worked on scalable low latency scheduling for data processing frameworks. In the past, he has contributed to several Hadoop projects, including Apache Flume and Apache Avro.

Xiangrui Meng leads the development of the machine learning library on Spark at Databricks. Prior to Databricks, he was the primary developer on a Hadoop MapReduce-based machine learning framework at LinkedIn. He holds a doctorate degree in Computational and Mathematical Engineering from Stanford, where he was conducting research on large scale machine learning.

Agenda:

6.00-6.45pm: Registration and Networking (with food & beverages)

6:45-7.00pm: Introduction

7:00-7:45pm: Presentations

7:45-8.15pm: Q&A session

We'll be raffling two free passes for HBaseCon 2014 (http://hbasecon.com) (May 5, San Francisco), so bring your business cards. Look for the Cloudera table!

The Hive Think Tank

Apache Spark - Making Sense of Big Data Faster and Easier

The Hive Think Tank

Details

Related topics

You may also like