Skip to content

Mining Big Data with Apache Spark

Mining Big Data with Apache Spark

Details

We are excited to have Reynold Xin, a member of the Berkeley AMPLab, stopping by to give a talk on Apache Spark. Hope to see many of you there!

ABSTRACT
Mining Big Data has been an incredibly frustrating experience, both due to its inherent complexity and the lack of better tools. In this talk, we will introduce Apache Spark and several efforts in the Spark ecosystem that makes data scientists/engineers' life easier.

Dubbed the leading successor to Hadoop MapReduce, Apache Spark is a cluster compute system that makes data analytics fast -- both fast to run and fast to write. Programs written in Spark can often outperform those in MapReduce by up to 100X, while being 10X shorter and more understandable. In addition, Spark also provides efficient support for streaming, query execution, machine learning, and graph computation through rich high level libraries. Last but not least, the project features one of the most active open source community in Big Data: 190+ developers from 50+ organizations have contributed code to the project.

SPEAKER
Reynold Xin is a committer on Apache Spark and a co-founder of Databricks. Prior to Databricks, he was pursuing a PhD in the UC Berkeley AMPLab.

SCHEDULE
6:30 - 7:00pm Social (food + beer served)
7:00 - 8:30pm Talk
8:30 - 9:00pm Social

Photo of SF Data Mining group
SF Data Mining
See more events
Trulia
116 New Montegomery St, 9th Floor · San Francisco, CA