[DC][Follow RSVP Link] Spark, Adv Analytics, Recommendations, Approximations, ML
Details
RSVP here: https://www.meetup.com/Washington-DC-Area-Spark-Interactive/events/229298675/
Abstract
Real-time, Advanced Analytics and Recommendations using Machine Learning, Graph Processing, Natural Language Processing, and Approximations with Apache Spark, Stanford CoreNLP, and Twitter Algebird. Slides available here
Agenda
Introductions
• Live, Interactive Recommendations Demo
• Spark ML, GraphX, Streaming, Kafka, Cassandra, Docker
Types of Similarity
• Euclidean vs. Non-Euclidean Similarity
• User-to-User Similarity
• Content-based, Item-to-Item Similarity (Amazon)
• Collaborative-based, User-to-Item Similarity (Netflix)
• Graph-based, Item-to-Item Similarity Pathway (Spotify)
Similarity Approximations at Scale
• Twitter Algebird
• MinHash and Bucketing
• Locality Sensitive Hashing (LSH)
Netflix Recommendations:
• From Ratings to Real-Time
• DVD-Ratings-based $1M Netflix Prize (2009)
• Streaming-based "Trending Now" (2016)
Wrap Up Q & A
Related Links
https://github.com/fluxcapacitor/pipeline/wiki http://cdn.oreillystatic.com/en/assets/1/event/105/Algebra%20for%20Scalable%20Analytics%20Presentation.pdf
http://static.echonest.com/BoilTheFrog/
http://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf
http://blog.echen.me/2011/10/24/winning-the-netflix-prize-a-summary/
http://www.cc.gatech.edu/~zha/CSE8801/CF/kdd-fp074-koren.pdf
About the Speaker
Chris Fregly is a Principal Data Solutions Engineer for the newly-formed IBM Spark Technology Center, an Apache Spark Contributor, and a Netflix Open Source Committer. Chris is also the founder of the global Advanced Apache Spark Meetup and author of the upcoming book, Advanced Spark @ advancedspark.com. Previously, Chris was a Data Solutions Engineer at Databricks and a Streaming Data Engineer at Netflix.
