Chicago Machine Learning Meetup: Random Forests


Details
Topic: Random Forests, Theory and Practice
Speaker: Michael Zimmer
ABSTRACT:
"Random Forests" is a powerful, off-the-shelf, machine learning algorithm for classification and regression. In this "journal club" style talk we'll examine the origins of the method, touching on ensemble methods, bagging and bootstrapping. Following that, we'll look at an implementation of Random Forests in R and its application to a recent Kaggle competition (Bond Pricing), as well as other examples.
KEY WORDS:
random forests, decision trees, bootstrapping, bagging, aggregating, ensemble methods
REFERENCES:
A clear, brief description for using random forests in R by the authors of the R package. A good place to start:
http://www.webchem.science.ru.nl/PRiNS/rF.pdf
A nice talk by Hemant Ishwaran on Random Forests (start at 2:10 to avoid intro).
Part 1:
http://www.youtube.com/watch?v=IO7F1-PlKNM&feature=related
Part 2: I suggest you watch up until 12:15
http://www.youtube.com/watch?v=cQrvTYVN0ko
This is mentioned in the above video. He describes "Bagging", which is the starting point for the Random Forest algorithm:
http://www.machine-learning.martinsewell.com/ensembles/bagging/Breiman1996.pdf
In the talk, we'll look at the Kaggle competition on bond pricing. In the Data section, be sure to look at the R code for the random forest benchmark (called "random_forest_benchmark.r").
http://www.kaggle.com/c/benchmark-bond-trade-price-challenge
Optional:
A description of the randomForest package in R, which comes in handy
http://cran.r-project.org/web/packages/randomForest/randomForest.pdf
If you seek more extensive information, here's the website of the author of the Random Forest algorithm (Leo Breiman)
http://stat-www.berkeley.edu/users/breiman/RandomForests/cc_home.htm
Michael Zimmer, PhD is a programmer/consultant with a background in science and an interest in machine learning.

Chicago Machine Learning Meetup: Random Forests