Becoming a scalable data scientist with GraphLab


Details
Data Science is perhaps the hottest profession on the market today. Folks with backgrounds ranging from Statistics and Physics to Engineering and Computer Science are eager and excited to transition to this field. However, designing and deploying data analysis and machine learning apps at scale is a significant challenge to overcome: For some folks, machine learning algorithms and methods can be obscure, too mathy, and disconnected from practice. For others, writing and deploying scalable software requires significant effort, which distracts them from focusing on their deep analytic efforts.
This tutorial focuses on two interpretations of scaling up data science: enable more of us to become data scientist and provide simple tools that significantly decrease the effort involved in deploying data science methods at scale. Using GraphLab with a simple Python interface running on your laptop, you will learn how to use state-of-the-art machine learning algorithms in practice, and through the use of GraphLab, the same code can be deployed at scale on a Hadoop cluster.
More specifically, we will provide an introduction to modern machine learning methods, and we will show how practitioners are using machine learning to detect fraud, analyze social networks, and build personalized recommender services. Through these case studies, we will walk you through the common tasks followed in all applied machine learning problems, from data cleaning, through model building, to predictions and finally insight. These techniques will be demonstrated in practice, using GraphLab and Python.
We will then turn to scaling it up, and show how the same code can be deployed at scale on a Hadoop cluster, how to build pipelines of data analysis jobs, how to monitor the performance and accuracy of these analyses directly from your laptop using our latest visualization techniques, and, finally, how to close the loop, and improve the performance of your system through interactive feature engineering, optimization, and model ensembling.
This tutorial is part of Strata NY. You should register here (http://strataconf.com/stratany2014/public/schedule/detail/36487) using discount code GRAPHLAB20.

Becoming a scalable data scientist with GraphLab