Improving Prediction using Nested Models and Simulated Out-of-Sample Data


Details
Abstract
In this talk Nina Zumel will discuss nested predictive models. These are models that predict an outcome or dependent variable (called y) using additional submodels that have also been built with knowledge of y. Practical applications of nested models include "the wisdom of crowds", prediction markets, variable re-encoding, ensemble learning, stacked learning, and superlearners.
Nested models can improve prediction performance relative to single models, but they introduce a number of undesirable biases and operational issues, and when they are improperly used, are statistically unsound. However modern practitioners have made effective, correct use of these techniques. In this talk Nina will give concrete examples of nested models, how they can fail, and how to fix failures.
The solutions we will discuss include advanced data partitioning, simulated out-of-sample data, and ideas from differential privacy. The theme of the talk is that with proper techniques, these powerful methods can be safely used.
Bio
Nina Zumel is a Principal Consultant with Win-Vector LLC, a data science consulting firm based in San Francisco. She is the co-author with John Mount of Practical Data Science with R, which presents the process and principles of data science from a practitioner's perspective.
Her technical interests include data science, statistics, statistical learning, and data visualization. She is also interested (at a layperson's level) in cognitive science, psychology, and linguistics. When she isn't working, she writes and dances.
Agenda
6.30 - 7.00 Food/networking
7.00 - 7.10 Announcements from WWC-SV Data Science Group
7.10 - 7.55 Talk
7.55 - 8.30 Q&A
Note: Parking is available in the garage across the street from the building.
Thank you, Insight, for sponsoring this event!
Join our Facebook group (https://www.facebook.com/groups/womenwhocodesiliconvalley/)!
Code of Conduct (https://github.com/WomenWhoCode/guidelines-resources/blob/master/code_of_conduct.md)

Improving Prediction using Nested Models and Simulated Out-of-Sample Data