BDAM 04/10: CDAP & hybrid data management; End-to-End ML Platforms and Airflow


Details
A big shout out to SF Big Analytics Group for the collaboration and Google SF for hosting and sponsoring this event!
Agenda:
6 - 6:30 pm Networking + food
6:30 pm -- 6:40 pm Introduction
6:40 pm -- 7:15 pm Talk 1 (Google) + QA
7:15 pm -- 7:50 pm Talk 2 (AirBnb) + QA
7:50 pm -- 8:35 pm Talk 3 (Lyft) + QA
8:40 pm -- 9 pm Networking + Closing
#Talk 1: Demystifying Hybrid Data Management using CDAP
Speaker: Bhooshan Mogal, Product Manager, Google
Briefing:
Cloud has emerged as a critical enabler of digital transformation, with the aim of reducing IT overheads and costs. However, cloud
migration is not instantaneous for a variety of reasons including data sensitivity, compliance and application performance. This results in the creation of diverse hybrid and multi-cloud environments and amplifies data management and integration challenges. This talk demonstrates how CDAP’s flexibility can allow you to utilize your existing on-premises infrastructure, as you evolve to the latest Big Data and Cloud services at your own pace, all while providing you a single, unified view of all your data, wherever it resides.
Speaker's Bio: Bhooshan Mogal is a Product Manager at Google, where he is focused on delivering best-in-class Data and Analytics services to GCP users. Prior to Google, he worked on data systems at Cask Data Inc, Pivotal and Yahoo.
#Talk 2: Bighead: Airbnb's end-to-end machine learning platform
Speaker: Andrew Hoh, Product Manager, Airbnb
Briefing:
Airbnb has a wide variety of ML problems ranging from models on traditional structured data to models built on unstructured data such as user reviews, messages and listing images. Bighead aims to tie together various open source and in-house projects to remove incidental complexity from ML workflows. Bighead is built on Python, Spark, and Kubernetes. The components include a lifecycle management service, an offline training and inference engine, an online inference service, a prototyping environment, and a Docker image customization tool.
In addition, Bighead includes a unified model building API that smoothly integrates popular libraries including TensorFlow, XGBoost, and PyTorch. This talk covers the architecture, the problems that each individual component and the overall system aims to solve, and a vision for the future of machine learning infrastructure. It’s widely adopted in Airbnb and we have variety of models running in production. We plan to open source Bighead to allow the wider community to benefit from our work.
Speaker's Bio: Andrew Hoh is the Product Manager for the ML Infrastructure and Applied ML teams at Airbnb. Previously, he has spent time building and growing Microsoft Azure's NoSQL distributed database. He holds a degree in computer science from Dartmouth College.
#Talk3: Apache Airflow At Lyft
Speaker: Tao Feng, Software Engineer, Lyft
Briefing:
Lyft has been one of the first companies to adopt Airflow in production. Today Airflow powers many Lyft use cases: from powering executive dashboards to metrics aggregation, to derived data generation, to machine learning feature computation, etc. In this talk, we will first cover how we operate Airflow at Lyft in production, then we will talk about the improvement we have done for Airflow to boost internal ETL development productivity. Lastly, we will talk about some of our open source contributions which could benefit the whole community.
Speaker's Bio:
Tao Feng is a software engineer at Lyft data platform team working on various data products. Tao is also a committer and PMC on Apache Airflow. Previously, Tao worked at Linkedin and oracle on data infrastructure, tooling and performance.
Venue: Google SF, 345 Spear Street, on the 7th Floor at the room Batgirl.
Parking Information:
Enter via the West elevator lobby to Google Office. Recommend Hills Plaza Garage for Parking, as it is right underneath the SPE building and costs $10 per vehicle after 5:00PM. It's open until 11:00 PM.

BDAM 04/10: CDAP & hybrid data management; End-to-End ML Platforms and Airflow