Apache Spark - Easier and Faster Big Data + Collaborative Filtering

Name: Apache Spark - Easier and Faster Big Data + Collaborative Filtering
Start: 2014-05-07T19:00:00-04:00
End: 2014-05-07T22:00:00-04:00
Location: Spotify

Hosted by François Le L.

Spark-NYC

Details

TALK #1 - Apache Spark - Easier and Faster Big Data, by Reynold Xin (DataBricks)

ABSTRACT : Dubbed the leading successor to Hadoop MapReduce, Apache Spark is a cluster compute system that makes data analytics fast -- both fast to run and fast to write. Programs written in Spark can often outperform those in MapReduce by 100X, while being 10X shorter and more understandable. In addition, Spark also provides efficient support for streaming, query execution, machine learning, and graph computation through rich high level libraries. Last but not least, the project features one of the most active open source community in Big Data: 170+ developers from 30+ organizations have contributed code to the project. In this talk, we will introduce the project, survey the high level libraries including streaming, SQL, and machine learning, and expand into how Spark can help you make better decisions easier and faster.

BIO : Reynold Xin is a committer on Apache Spark and a co-founder of Databricks. He is instrumental in the development of many high level frameworks on Spark, including SQL and graph computation. Prior to Databricks, he was pursuing a PhD in the UC Berkeley AMPLab.

TALK #2 - Collaborative Filtering with Spark, by Christopher Johnson (Spotify)

ABSTRACT : Spotify uses a range of Machine Learning models to power its music recommendation features including the Discover page and Radio. Due to the iterative nature of training these models they suffer from IO overhead of Hadoop and are a natural fit to the Spark programming paradigm. In this talk I will present both the right way as well as the wrong way to implement collaborative filtering models with Spark. Additionally, I will deep dive into how Matrix Factorization is implemented in the MLlib library.

BIO : Chris Johnson is a Machine Learning dude at Spotify who hacks on music data and works on their music recommendation engine. Prior to Spotify Chris was pursuing a PhD at UT Austin.

The rest of the agenda is up for grabs, feel free to submit an idea!!!

Spark-NYC

Apache Spark - Easier and Faster Big Data + Collaborative Filtering

Spark-NYC

Details

Related topics

You may also like