Spark at Zillow & Realtime Analytics Spark, NiFi, Kafka, Cassandra, ES, Docker


Details
Spark at Zillow & Realtime Analytics
Spark and Machine Learning at Zillow
Real-time, Advanced Analytics and Recommendations using Machine Learning, Graph Processing, Natural Language Processing, and Approximations with Apache Spark, Stanford CoreNLP, and Twitter Algebird
BONUS: Netflix Recommendations: Then and Now
Thank you Zillow for hosting this Meetup and sponsoring food and beer!
Agenda
6:00-6:30 Networking/Food/Drinks/Beer!
6:30-6:35 Intro
6:35-7:05 Spark and Machine Learning at Zillow
· Data Lake- Steven Hoelscher
· User segmentation- Alex Chang
· Zestimate – David Fgnan
7:05-8:05 Realtime Analytics Live, Interactive Recommendations Demo with Spark ML, GraphX, Streaming, Kafka, Cassandra, Docker
Types of Similarity
Euclidean vs. Non-Euclidean Similarity
User-to-User Similarity
Content-based, Item-to-Item Similarity (Amazon)
Collaborative-based, User-to-Item Similarity (Netflix)
Graph-based, Item-to-Item Similarity Pathway (Spotify)
Similarity Approximations at Scale
Twitter Algebird
MinHash and Bucketing
Locality Sensitive Hashing (LSH)
BONUS: Netflix Recommendations: From Ratings to Real-Time
DVD-Ratings-based $1M Netflix Prize (2009)
Streaming-based "Trending Now" (2016)
8:05-8:30 Q & A, Networking
Bio
Chris Fregly is a Research Scientist at PipelineIO, a streaming analytics and machine learning startup in San Francisco.
Chris is an Apache Spark Contributor, Netflix Open Source Committer, organizer of the global Advanced Spark and TensorFlow Meetup, and author of the upcoming book, Advanced Spark.
Previously, Chris was an Engineer at Databricks and Netflix - as well as founding member of the IBM Spark Technology Center.
Related Links
https://github.com/fluxcapacitor/pipeline/wiki
http://cdn.oreillystatic.com/en/assets/1/event/105/Algebra%20for%20Scalable%20Analytics%20Presentation.pdf
http://static.echonest.com/BoilTheFrog/
http://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf
http://blog.echen.me/2011/10/24/winning-the-netflix-prize-a-summary/
http://www.cc.gatech.edu/~zha/CSE8801/CF/kdd-fp074-koren.pdf

Spark at Zillow & Realtime Analytics Spark, NiFi, Kafka, Cassandra, ES, Docker