Past Meetup

Estimating Effect Sizes in Machine Learning Predictive Models

This Meetup is past

315 people went

Location image of event venue


For our, ah, extremely-late February Meetup, we are very pleased to welcome back ( Dr. Abhijit Dasgupta, who will be speaking about some of his cutting edge work that straddles the "two cultures (" of statistics and machine learning. When using classical regression models, it is relatively easy to estimate effect size ( -- the conditional effect on the outcome as you change one of the predictors. But when your predictive model is a black box, such as a random forest or neural network, this valuable information is typically unattainable. Abhijit and his colleagues have found practical new methods for estimating effect size when using predictive models. Anyone who has had the experience of being at a loss for words when trying to interpret or communicate a complex predictive model will want to learn about these new approaches.


6:30pm -- Networking and Refreshments 7:00pm -- Introduction 7:15pm -- Presentation and discussion 8:30pm -- Post presentation conversations 8:45pm -- Adjourn for Data Drinks (reserved space at Tonic (!)

Predictive modeling has been widely used for prediction, but a constant criticism has been in getting interpretation of the conditional effects of different predictors on the outcome. This criticism has been specially loud with respect to "black box" methods like random forests, ensemble learners and neural networks. We go back to the basic definition of effect size estimates in statistics, which is based on the idea of counterfactual outcomes, and find that we can explicitly leverage the idea of counterfactuals within the predictive modeling framework to estimate traditional and non-traditional effect size estimates both at an individual, subgroup and global level in a flexible manner. We will show how main effects, interactions, and more general nonlinear effects can be estimated in this fashion, without explicitly specifying a model structure per se. We will illustrate this with some recent work on binary regression, obtaining odds ratios, risk differences, risk ratios and both additive and multiplicative interaction effects.

The presentation will be based, in part, on the following paper:

Dasgupta, Szymczak, Moore, Bailey-Wilson, and Malley (2013). Risk Estimation using Probability Machines. Under review.


Abhijit Dasgupta ( is a biostatistician, data scientist, and consultant for the NIH, local startups, and other clients. He has a PhD in biostatistics from the University of Washington, and has published work in JASA, Genetic Epidemiology, and a number of other journals.