Real-time Recommendations using Spark


Details
What this is about
Real-time, Advanced Analytics and Recommendations using Machine Learning, Graph Processing, Natural Language Processing, and Approximations with Apache Spark, Stanford CoreNLP, and Twitter Algebird. Slides available here (http://www.slideshare.net/cfregly)
Agenda
Intro
• Live, Interactive Recommendations Demo
• Spark ML, GraphX, Streaming, Kafka, Cassandra, Docker
Types of Similarity
• Euclidean vs. Non-Euclidean Similarity
• User-to-User Similarity
• Content-based, Item-to-Item Similarity (Amazon)
• Collaborative-based, User-to-Item Similarity (Netflix)
• Graph-based, Item-to-Item Similarity Pathway (Spotify)
Similarity Approximations at Scale
• Twitter Algebird
• MinHash and Bucketing
• Locality Sensitive Hashing (LSH)
Netflix Recommendations:
• From Ratings to Real-Time
• DVD-Ratings-based $1M Netflix Prize (2009)
• Streaming-based "Trending Now" (2016)
Wrap Up Q & A
Related Links
https://github.com/fluxcapacitor/pipeline/wiki
http://static.echonest.com/BoilTheFrog/
http://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf
http://blog.echen.me/2011/10/24/winning-the-netflix-prize-a-summary/
http://www.cc.gatech.edu/~zha/CSE8801/CF/kdd-fp074-koren.pdf
About the Speaker:
We've had Chris here before! Chris Fregly is a Principal Data Solutions Engineer for the newly-formed IBM Spark Technology Center, an Apache Spark Contributor, and a Netflix Open Source Committer. Chris is also the founder of the global Advanced Apache Spark Meetup and author of the upcoming book, Advanced Spark @ advancedspark.com (http://advancedspark.com/). Previously, Chris was a Data Solutions Engineer at Databricks and a Streaming Data Engineer at Netflix.

Real-time Recommendations using Spark