Stefan Wager, Stanford Statistics
In machine learning, we often need to use regularization to ensure that our predictors are stable and don't over-fit to random quirks of the training data. For many modern applications, good regularization can have a big impact on whether or not a method works; examples include ridge regression for high-dimensional estimation, naive Bayes language modeling, and Google's PageRank algorithm.
In this talk, I'll discuss a generic recipe for designing problem-specific regularizers. Specifically, I'll show how to turn insights about the statistical noise affecting our training sample into practical and performant regularizers using the bootstrap. The resulting predictors appear to work well in practice, and in particular allowed us to beat the previous state-of-the-art on a standard document classification task.
Stefan Wager, Sida Wang, and Percy Liang. Dropout Training as Adaptive Regularization. NIPS, 2013.
Stefan Wager, Will Fithian, Sida Wang, and Percy Liang. Altitude Training: Strong Bounds for Single-Layer Dropout. NIPS, 2014.