Past Meetup

The Feature Stores: the missing API between Data Engineering and Data Science?

This Meetup is past

125 people went

Location image of event venue

Details

This meetup is focused around Features Stores with three talks from Jim Dowling (Logical Clocks), Varant Zanoyan (Airbnb), and Nick Handel (Branch).

Thanks to Mesosphere for hosting the event and ArangoDB for sponsoring Pizza!

*The Feature Store: the missing API between Data Engineering and Data Science?*
Machine Learning (ML) pipelines are the key building block for productionizing ML code. However, pipelines are often developed as "silos" - features tend not to be easily re-used across pipelines or even within the same pipeline. Silos lead to duplication, unnecessarily re-implementing features and in the worst case correctness problems, if, for example, the features used for training and serving have inconsistent implementations. The Feature Store solves the problem of siloed and ad-hoc machine learning pipelines, by providing a data layer where feature engineering can be separated from the usage of features to generate training data. That is, the Feature Store should provide a clean API separating Data Engineering from Data Science.

In this talk, we will introduce the world's first open-source Feature Store, built on Hopsworks, Apache Spark, and Apache Hive and targeting both TensorFlow/Keras and PyTorch. We will show how ML pipelines can be programmed, end-to-end, in Python, and the role of the Feature Store as a natural interface between Data Engineers and Data Scientists. In an end-to-end pipeline, we will show how the Feature Store works, and how you can write end-to-end ML pipelines in Python only (if you so choose).

Speaker Bio:
Jim Dowling is the CEO of Logical Clocks AB, as well as an Associate Professor at KTH Royal Institute of Technology in Stockholm. He is the lead architect of Hops, the world's most fastest and most scalable Hadoop distribution and first Hadoop platform with support for GPUs as a resource. He is a regular speaker at AI industry conferences, and blogs at O'Reilly on AI.

*Zipline at Airbnb*
Zipline is Airbnb’s soon to be open-sourced data management platform specifically designed for ML use cases. It has taken the task of training data generation from months to days and offers data management solutions from model training to serving. This talk will cover the framework at a high level, focusing on the specific challenges of data engineering for ML, and how Zipline provides a solution.

Speaker Bio:
Varant Zanoyan is a software engineer on the Machine Learning Infrastructure team at Airbnb where he focuses on Zipline, a data management framework for Machine Learning. Previously, he solved data infrastructure problems at Palantir Technologies.

*Machine Learning Infrastructure at an Early Stage*
Good machine learning is built on infrastructure but many startups don't have the bandwidth or resources to build this foundation while scaling. It's difficult to prioritize the pieces of ML Infrastructure that data scientists and engineers need to be productive and successful when the scale of these projects can be months or years for small teams of engineers. The dividends are large down the road but the cost of pursuing infrastructure that doesn't work or doesn't solve the right problems can leave a team months down the road without necessary progress. This talk focuses on the foundation that any good machine learning system is built on and the elements of ML infrastructure to focus on first.

Speaker Bio:
Nick Handel serves as Branch International's Head of Data Science. Prior to joining Branch, he was a Product Manager for Airbnb's machine learning infrastructure teams. Before moving to centralize the company's artificial intelligence efforts, he was an early member of the company's data science team, helping the company expand internationally between 2014 and 2015 and leading a data science team that launched Airbnb's Trips product in 2016. Before joining Airbnb, he was a research economist at BlackRock, focusing on emerging market debt.