Feature engineering refers to the process of visualizing and exploring data to find and sometimes create useful features out of existing data. A big chunk of time should be spent in feature engineering. Once useful features are present, any off the shelf predictive modeling algorithm can provide decent performance in most cases.
In this talk, we will go through the steps involved in building a predictive model for a classification problem. Below is an overview of the talk:
We will be using Titanic data set for this tutorial. Details here: https://www.kaggle.com/c/titanic-gettingStarted
We will be using R and Azure Machine Learning Studio for this tutorial.
Exploration and Visualization:
• Getting familiar: Sampling and eyeballing data
• Understanding class distribution: Pie charts in R.
• Understanding feature values and distribution: Box plots, histograms, density plots, box and whisker plots, violin plots, scatter plots in R
• Feature processing: Missing values, creating more features, reducing dimensionality
Building A Predictive Model:
We will build a predictive model using the random forest R package. We will look at things like training error, variable importance and various metrics for classifier evaluation.
Azure Machine Learning Studio Demo:
In the end, we will show how the whole work flow can be built in Azure Machine Learning studio.
I am trying to get someone from the product group in Microsoft that worked on Azure ML studio to come and do a demo. I will update everyone either way.