An Introduction to Differential Privacy as Applied to Machine Learning


We have a very exciting talk this month by Dr. Nina Zumel (, a principal consultant and co-founder at Win-Vector LLC (, a data science consulting firm based in San Francisco. She is also the co-author, with Dr. John Mount (, of Practical Data Science with R ( and a contributor to the popular Win-Vector blog ( When she isn’t working, she also likes to dance and read folklore and ghost stories.

Differential privacy ( was originally developed to facilitate secure analysis over sensitive data. It’s back in the news again now, with exciting ( results ( from Cynthia Dwork, et. al. that apply results from differential privacy to improve machine learning performance. Nina will give a brief introduction to the ideas behind differential privacy, and review how differential privacy can be used to enable safer re-use of holdout data in machine learning. She will also show how differential privacy can be used to improve effects coding: that is, how you can build efficient encodings of categorical variables with many levels, without introducing additional bias into modeling procedures.

Over the past month or so, the Win-Vector blog ( has featured a series of posts on differential privacy (, if you want to learn more. The Win-Vector blog is one of my favorite applied machine learning / data science blogs. It offers clear, yet very in-depth technical articles on a number of interesting and practical data science topics. It's not just "yet another data science blog"... :)

The meetup will be hosted at the new SF office for Metis (, which offers project-based data science bootcamps and professional development training.

We will also have pizza, thanks for support from Metis!