Skip to content

Details

Registration is Required: https://usfca.zoom.us/webinar/register/WN_L3-3ePUxSuC1anxHSwg0_A

Presenters:

Dr, Nina Zumel, Co-founder, Principal Consultant at Win-Vector, LLC
Dr, John Mount, Co-founder, Principal Consultant at Win-Vector, LLC

For classification problems, many practitioners make the mistake of using decision rules or hard classification rules to solve the problem. This seemingly natural step introduces a number of weaknesses into the the modeling process, that many attempt to work around by down-sampling or up-sampling when the concept prevalence is unbalanced. We argue the more effective procedure is to insist on working with probability models, instead of classification rules. We will explain the methodology as a Clemenceau-style maxim: "decision thresholds are too serious a matter to entrust to the data scientists." We will show how to work with probabilities, and demonstrate effective methods to evaluate and present probability models. We end with a nice system for later picking decision thresholds that maximize business utility.

We will share example code and data in Python.

Nina is a co-founder and principal consultant at Win-Vector LLC, a San Francisco data science consultancy and training company. She is a contributor to the R packages including vtreat and WVPlots. Nina is co-author of the popular text Practical Data Science with R and occasionally blogs at the Win-Vector Blog on data science and R. Her technical interests include data science, statistics, statistical learning, and data visualization. She is a principle designer of EMC's data scientist certification program, and a principle designer of Win Vector's "data intensive for engineers" (a private data science in Python training program in its 3rd year). More can be found here: https://www.linkedin.com/in/ninazumel .

John is a co-founder and principal consultant at Win-Vector LLC, a San Francisco data science consultancy and training company. He is the author of several R and Python packages, including the data treatment package vtreat. John is co-author of Practical Data Science with R and blogs at the Win-Vector Blog about data science and R programming. His interests include data science, statistics, R programming, and theoretical computer science. He is the principle presenter for Win Vector's "data intensive for engineers" (a private data science in Python training program in its 3rd year). More can be found here: https://www.linkedin.com/in/johnamount .

Members are also interested in