Space and food for this month's event has graciously been provided by Tonic Design
6:00: Meet and greet
6:30: Sebastian Raschka
7:00: Lightning talks (Interested in speaking?)
Machine Learning and Performance Evaluation — Overcoming the Selection Bias
Every day in scientific research and business applications, we rely on statistics and machine learning as support tools for predictive modeling. To satisfy our desire to model uncertainty, to predict trends, and to predict patterns that may occur in the future, we developed a vast library of tools for decision making. In other words, we learned to take advantage of computers to replicate the real world, making intuitive decisions more quantitative, labeling unlabeled data, predicting trends, and ultimately trying to predict the future. Now, whether we are applying predictive modeling techniques to our research or business problems, we want to make "good" predictions!
In the presence of modern machine learning libraries, choosing a machine learning algorithm to fit a model to our training data has never been that simple. However, making sure that our model generalizes well to unseen data is still up to us—the machine learning practitioners and researchers. In this talk, we will discuss the two most important components of estimating generalization performance: bias and variance. We will discuss how we can make the best use of our data at hand—via proper (re)sampling—and how to pick appropriate performance metrics. Then, we will compare various techniques for algorithm selection and model selection to find the right tool and approach for our task at hand. In the context of the "bias-variance trade-off," we will go over potential weaknesses in common modeling techniques, and we will learn how to take uncertainty into account to build predictive model performs well on unseen data.
Sebastian Raschka is the author of the bestselling book Python Machine Learning. As a Ph.D. candidate at Michigan State University, he is developing new computational methods in the field of computational biology. Sebastian has many years of experience with coding in Python and has given several seminars on the practical applications of data science and machine learning. Sebastian loves to write and talk about data science, machine learning, and Python, and he is really motivated to help people developing data-driven solutions without necessarily requiring a machine learning background.
Sebastian is also actively contributing to open source projects, and methods that he implemented are now successfully used in machine learning competitions such as Kaggle. In his free time, Sebastian is also working on models for sports predictions, and if he is not sitting in front of a computer, he enjoys playing sports in his spare time.