Skip to content

Netflix's Performance Optimization of Recommendation Pipeline using Spark SQL

Photo of DB Tsai
Hosted By
DB T.
Netflix's Performance Optimization of Recommendation Pipeline using Spark SQL

Details

We're very excited to invited Dr. Jiang from Netflix to talk about how Netflix uses cutting edge features in Spark SQL such as predicate pushdown, and catalyst optimizations to compute the features for offline training pipeline while maintaining the consistency between online feature and offline feature generation.

Food and drink will be sponsored by Netflix. Join us.

Abstract:

Netflix is the world’s largest streaming service, with over 80 million members worldwide. Machine learning algorithms are used to recommend relevant titles to users based on their tastes.
At Netflix, we use Apache Spark to power our recommendation pipeline. Stages in the pipeline, such as label generation, data retrieval, feature generation, training, validation, are based on Spark ML PipleStage framework. While this provides developers the flexibility to develop individual components as encapsulated pipeline stages, we find that coordination across stages can potentially provide significant performance gains.
In this talk, we discuss how our machine learning pipeline based on Spark has been improved over the years. Techniques such as predicate pushdown, wide transformation minimization, have lead to significant run time improvement and resource savings.

Hua's Bio:

Hua Jiang received the Ph.D. degree in electrical engineering from the University of Minnesota, Twin Cities, in 2012. He was with the Design Group of Synopsys Inc. and the Data Infrastructure Group, LinkedIn Corporation. He is a Senior Software Engineer at the Personalization Group of Netflix Inc. His work includes building machine learning infrastructure and exploring for novel computational paradigms to accommodate fast-growing machine learning needs.

Agenda:

6:30 - 7:00 pm check-in/networking/food
7:00 pm -- 7:45 pm main talk
7:45 pm -- 8:00 pm Q/A
8:00 pm -- 8:30 pm closing
8:30 pm -- event/office closed

Here is the map of Netflix campus.

https://secure.meetupstatic.com/photos/event/f/5/1/600_461043921.jpeg

Photo of SF Big Analytics group
SF Big Analytics
See more events
Netflix Building D, Kabuki Theater
131 Albright Way · Los Gatos, CA