This month, we are excited to have Man Zhang, Solutions Architect at Qubole, speaking!
Qubole delivers a Self-Service Platform for Big Data Analytics built on Amazon Web Services, Microsoft and Google Clouds. Qubole was started by the team that built and ran Facebook's Data Service when they founded and authored Apache Hive. With Qubole, a data scientist can now spin up hundreds of clusters on their public cloud of choice and begin creating ad hoc and/or batch queries in under five minutes and have the system autoscale to the optimal compute levels as needed.
Companies now need to apply machine learning (ML) techniques on their data in order to remain relevant. Among the new challenges faced by data scientists is the need to build get access to large data sets so that trained models can scale to run with production data.
Aside from dealing with larger data volumes, these pipelines need to be flexible in order to accommodate the variety of data and the high processing velocity required by the new ML applications. Apache Airflow and Spark addresses these challenges by providing a highly scalable technology for autoscaling big data engines.
In this presentation we will cover:
- Some of the typical challenges faced by data scientists when building pipelines for machine learning.
- Typical uses of the various big data engines to address these challenges.
- Real-world example using Apache Spark and Airflow to operationalize a recommendation engine
As always, we'll have a fun group with Pizza, Beer, and Refreshments!
5:30 - 6:00 Socializing
6:00 - 7:00 Building Data Pipelines for Machine Learning
7:00 - 7:30 Questions and Closing Remarks