Skip to content

Spark at Zillow & Realtime Analytics Spark, NiFi, Kafka, Cassandra, ES, Docker

Photo of Denny Lee
Hosted By
Denny L. and 2 others
Spark at Zillow & Realtime Analytics Spark, NiFi, Kafka, Cassandra, ES, Docker

Details

Spark at Zillow & Realtime Analytics

Spark and Machine Learning at Zillow

Real-time, Advanced Analytics and Recommendations using Machine Learning, Graph Processing, Natural Language Processing, and Approximations with Apache Spark, Stanford CoreNLP, and Twitter Algebird

BONUS: Netflix Recommendations: Then and Now

Thank you Zillow for hosting this Meetup and sponsoring food and beer!

Agenda

6:00-6:30 Networking/Food/Drinks/Beer!

6:30-6:35 Intro

6:35-7:05 Spark and Machine Learning at Zillow

· Data Lake- Steven Hoelscher

· User segmentation- Alex Chang

· Zestimate – David Fgnan

7:05-8:05 Realtime Analytics Live, Interactive Recommendations Demo with Spark ML, GraphX, Streaming, Kafka, Cassandra, Docker

Types of Similarity

Euclidean vs. Non-Euclidean Similarity

User-to-User Similarity

Content-based, Item-to-Item Similarity (Amazon)

Collaborative-based, User-to-Item Similarity (Netflix)

Graph-based, Item-to-Item Similarity Pathway (Spotify)

Similarity Approximations at Scale

Twitter Algebird

MinHash and Bucketing

Locality Sensitive Hashing (LSH)

BONUS: Netflix Recommendations: From Ratings to Real-Time

DVD-Ratings-based $1M Netflix Prize (2009)

Streaming-based "Trending Now" (2016)

8:05-8:30 Q & A, Networking

Bio

Chris Fregly is a Research Scientist at PipelineIO, a streaming analytics and machine learning startup in San Francisco.

Chris is an Apache Spark Contributor, Netflix Open Source Committer, organizer of the global Advanced Spark and TensorFlow Meetup, and author of the upcoming book, Advanced Spark.

Previously, Chris was an Engineer at Databricks and Netflix - as well as founding member of the IBM Spark Technology Center.

Related Links

https://github.com/fluxcapacitor/pipeline/wiki
http://cdn.oreillystatic.com/en/assets/1/event/105/Algebra%20for%20Scalable%20Analytics%20Presentation.pdf
http://static.echonest.com/BoilTheFrog/
http://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf
http://blog.echen.me/2011/10/24/winning-the-netflix-prize-a-summary/
http://www.cc.gatech.edu/~zha/CSE8801/CF/kdd-fp074-koren.pdf

Photo of Seattle Spark+AI Meetup group
Seattle Spark+AI Meetup
See more events
Zillow HQ
1301 2nd Avenue, 30th Floor · Seattle, WA