Skip to content

Streaming for Personalization Datasets at Netflix

Photo of DB Tsai
Hosted By
DB T.
Streaming for Personalization Datasets at Netflix

Details

Abstract:

Streaming applications have historically been complex to design and implement because of the significant infrastructure investment. However, recent active developments in various streaming platforms provide an easy transition to stream processing, and enable analytics applications/experiments to consume near real-time data without massive development cycles.In this session, we will present our experience on stream processing unbounded datasets in the personalization space. The datasets consisted of -- but were not limited to -- the stream of playback events (all of Netflix’s plays worldwide) that are used as feedback for all personalization algorithms. These datasets when ultimately consumed by our machine learning models, directly affect the customer’s personalized experience, which means that the impact is high and tolerance for failure is low. We’ll talk about the experiments we did to compare Apache Spark and Apache Flink, the impact that we had on our customers, and (most importantly) the challenges we faced.

Spearker bio

Shriya is an engineer on the personalization analytics team at Netflix. She has been working on writing a framework on top of Spark batch processing that allows for a generic way of producing the various data-sets that are required for our machine learning algorithms. She is also now exploring streaming as a mechanism to provide data that is as accurate as batch, but is updated more frequently in order to refresh the personalized experience of Netflix users, more than once a day.

Agenda

6 - 6:30 pm light dinner + networking

6:30 pm -- 6:35 pm introduction

6:35 pm -- 7:40 pm main talk + QA

7:40 pm -- 8 pm networking

8 pm -- 8 :30 pm closing

8:30 pm -- office closed

Photo of SF Big Analytics group
SF Big Analytics
See more events
AMD Auditorium
One AMD Place · Sunnyvale, CA