Skip to content

Special Presentation Night (May 2016)

Photo of Matthew Farrellee
Hosted By
Matthew F. and 3 others
Special Presentation Night (May 2016)

Details

Hi Everybody!

We have an in-depth talk and demo by Chris Fregly for this month's special presentation night, with sponsorship for both space and food/drink generously provided by Spotify!

Check out the details below. I imagine this will be a popular event, so we are limiting RSVPs to make sure people can fit in the space.

Cheers!
Nick, Yana, Matt, and Cao

Meetup Agenda:

• 6:00 - 6:45: Food, drink, and mingling

• 6:45 - 7:00: Opening remarks and sponsor message

• 7:00 - 9:00: Feature talk

Feature Talk: Real-time Aggregations, Approximations, Similarities, and Recommendations at Scale using Spark Streaming, ML, GraphX, Kafka, Cassandra, Docker, CoreNLP, Word2Vec, LDA, and Twitter Algebird

Talk Abstract: Starting with a live, interactive demo generating audience-specific recommendations, we'll dive deep into each of the key components including NiFi, Kafka, Stanford CoreNLP, Docker, Word2Vec, LDA, Twitter Algebird, Spark Streaming, SQL, ML, GraphX. As a bonus, we'll discuss the latest Netflix Recommendations Pipeline and related open source projects.

Talk Agenda:

• Intro

• Live, Interactive Recommendations Demo

• Spark Streaming, ML, GraphX, Kafka, Cassandra, Docker, CoreNLP, Word2Vec, LDA, and Twitter Algebird (advancedspark.com)

• Types of Similarity

• Euclidean vs. Non-Euclidean Similarity

• Jaccard Similarity

• Cosine Similarity

• LogLikelihood Similarity

• Edit Distance

• Text-based Similarities and Analytics

• Word2Vec

• LDA Topic Extraction

• TextRank

• Similarity-based Recommendations

• User-to-User

• Content-based, Item-to-Item (Amazon)

• Collaborative-based, User-to-Item (Netflix)

• Graph-based, Item-to-Item "Pathways" (Spotify)

• Aggregations, Approximations, and Similarities at Scale

• Twitter Algebird

• MinHash and Bucketing

• Locality Sensitive Hashing (LSH)

• BloomFilters

• CountMin Sketch

• HyperLogLog

• Q & A

Speaker Bio: Chris Fregly is a Research Engineer @ Flux Capacitor AI in SF, an Apache Spark Contributor, and a Netflix Open Source Committer.

Chris is also the founder of the global Advanced Apache Spark Meetup and author of the upcoming book, Advanced Spark @ advancedspark.com.

Previously, Chris was a Data Solutions Engineer at Databricks and a Streaming Data Engineer at Netflix.

When Chris isn’t contributing to Spark and other open source projects, he’s creating book chapters, slides, and demos to share knowledge with his peers at Meetups and conferences throughout the world.

Photo of Boston Data Technology (Boston Data Group/BDT) group
Boston Data Technology (Boston Data Group/BDT)
See more events
Spotify/Echo Nest
48 Grove Street · Somerville, MA