Skip to content

Details

Speaker: Dr. Nina Zumel and Dr. John Mount
Title: Advanced Data Preparation for Supervised Machine Learning

Title: Advanced Data Preparation for Supervised Machine Learning

Brief abstract: Dr. Nina Zumel and Dr. John Mount will present methods for advanced data preparation for supervised machine learning. In particular we will show how to safely pre-process high cardinality categorical variables for later use. We will spend time on the important points of cross or out of sample methods to reduce over-fit. We will work theory and examples, and show how the vtreat package can be used in projects. We wil also preview chapter 8 of Practical Data Science with R, 2nd Edition: Advanced Data Preparation.

Bios:

Nina Zumel is a Principal Consultant with Win-Vector, LLC, a data science consultancy in San Francisco. She has a Ph.D. in robotics from Carnegie Mellon and is one of the authors of Practical Data Science with R, a popular text on data science.

John Mount is a Principal Consultant with Win-Vector LLC, and co-author of "Practical Data Science with R, 2nd Edition", Manning 2019. He has a Ph.D. in computer science from Carnegie Mellon

Both John and Nina maintain a number of open source R and Python packages for data science

Members are also interested in