Real-time Aggregations, Approximations, Similarities, and Recommendations


Details
Hi. This event follows the IBM Spark PoT on the same day. You must RSVP here for the evening Meetup separately but no need to also be registered for the PoT if you are ONLY going to the Meetup. See you there! -d
Title*
Real-time Aggregations, Approximations, Similarities, and Recommendations at Scale using Spark Streaming, ML, GraphX, Kafka, Cassandra, Docker, CoreNLP, Word2Vec, LDA, and Twitter Algebird
Agenda
Intro
Live, Interactive Recommendations Demo
Spark Streaming, ML, GraphX, Kafka, Cassandra, Docker, CoreNLP, Word2Vec, LDA, and Twitter Algebird
(advancedspark.com (http://advancedspark.com/))
Types of Similarity
Euclidean vs. Non-Euclidean Similarity
Jaccard Similarity
Cosine Similarity
LogLikelihood Similarity
Edit Distance
Text-based Similarities and Analytics
Word2Vec
LDA Topic Extraction
TextRank
Similarity-based Recommendations
User-to-User
Content-based, Item-to-Item (Amazon)
Collaborative-based, User-to-Item (Netflix)
Graph-based, Item-to-Item "Pathways" (Spotify)
Aggregations, Approximations, and Similarities at Scale
Twitter Algebird
MinHash and Bucketing
Locality Sensitive Hashing (LSH)
BloomFilters
CountMin Sketch
HyperLogLog
Q & A
Bio
Chris Fregly is a Principal Data Solutions Engineer for the newly-formed IBM Spark Technology Center, an Apache Spark Contributor, and a Netflix Open Source Committer.
Chris is also the founder of the global Advanced Apache Spark Meetup and author of the upcoming book, Advanced Spark @ advancedspark.com (http://advancedspark.com/).
Previously, Chris was a Data Solutions Engineer at Databricks and a Streaming Data Engineer at Netflix.
When Chris isn’t contributing to Spark and other open source projects, he’s creating book chapters, slides, and demos to share knowledge with his peers at meetups and conferences throughout the world.
Related Links
https://github.com/fluxcapacitor/pipeline/wiki
http://static.echonest.com/BoilTheFrog/
http://www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf
http://blog.echen.me/2011/10/24/winning-the-netflix-prize-a-summary/
http://www.cc.gatech.edu/~zha/CSE8801/CF/kdd-fp074-koren.pdf

Real-time Aggregations, Approximations, Similarities, and Recommendations