Random Forest Classification Workshop: Practical Tips


Details
Event and refreshments sponsored by OnDeck. Space sponsored by ThoughtWorks.
Agenda:
10:00am - 12:00pm: Talk by Anita Schmid and Christine Hurtubise
12:00pm - 1:00pm: Lunch
1:00pm - 3:00pm: Breakouts
Please bring a laptop as this is a hands-on workshop. Knowledge of either R or Python (with Pandas and Scikit-learn) is required.
During the morning session Anita will describe the random forest algorithm for classification and go over some use cases.
Participants will work in teams on a classification problem using a public data set (Airline on-time performance data (http://stat-computing.org/dataexpo/2009/) or Kaggle's Titanic data set (https://www.kaggle.com/c/titanic)). Tutors from OnDeck's data science team (Anaelle Bohbot, Siying Chen, Justin Law and Abhra Mitra) will guide the teams during the breakouts session, and at the end of the workshop each team will present their solution. We have identified suitable public datasets, such as the Titanic dataset and the airline delay dataset and will ask participants to download these before the workshop. Participants can either use R or Python, whichever language they are more accustomed to.
Anita Schmid, Senior Data Scientist at OnDeck (http://www.ondeck.com/), @OnDeckCapital (https://twitter.com/OnDeckCapital)
In 2014, Anita joined OnDeck, a small business lending company, where she works closely with the Sales and Marketing departments. She works on Marketing Models, Sales Optimization, Funnel Reporting, Attribution and A/B testing. She has a Diploma (equivalent to MSc) in Physics from the ETH (Swiss Federal Institute of Technology) in Zurich, Switzerland, and earned a Ph.D. in the field of Systems Neuroscience at the same institution before moving to New York in 2006. Before transitioning to Data Science with the help of the Insight Data Science Fellows Program (http://insightdatascience.com/), she worked as research faculty at Weill Cornell Medical College in NYC.
Christine Hurtubise (@cfhurtub) currently works as a Manager in the Risk department at OnDeck, where she leads model validation initiatives. She previously worked at SunGard (now FIS) as a Senior Consultant, focusing on developing credit risk models for regional and commercial banks. Christine has worked with technology platforms and frameworks which predict customer and portfolio behavior for the past six years. Prior to joining the industry, she worked on data visualization techniques in a biology research lab at the University of Pennsylvania. Christine graduated magna cum laude in 2008 from the Mathematics department at the University of Pennsylvania.

Random Forest Classification Workshop: Practical Tips