Skip to content

Evaluation of Traditional and Novel Feature Selection Approaches

Photo of Aaron Richter
Hosted By
Aaron R.
Evaluation of Traditional and Novel Feature Selection Approaches

Details

Virtual Meetup! We will create breakout rooms for informal chats and then join the main talk together. Please join our Slack for more information and Zoom links.

https://join.slack.com/t/miamimachinelearning/shared_invite/zt-3r4eznao-sxtX2sW5rMmg_GjmYN2XGw

Schedule:
6:30pm-7:00pm: Virtual networking (via Zoom breakout rooms)
7:00pm-8:00pm: Talk by Ben
8:00pm-8:30pm: more networking!

Title:
Evaluation of Traditional and Novel Feature Selection Approaches

Description:
Selecting the optimal set of features is a key step in the ML modeling process. This talk will present research conducted that tested five approaches for feature selection. The approaches included current widely used methods, along with novel approaches for feature selection using open-source libraries, building a classification model using the Lending Club dataset.

A central component to the machine learning process is feature selection. Selecting the optimal set of features is important to generate a best fit model that generalizes to unseen data. A widely used approach for feature selection involves calculating Gini Importance (Gain) to identify the best set of features.

However, recent work from Scott Lundberg has found challenges with the consistency of the Gain attribution method. Ben Fowler shares the results of model metrics on the Lending Club dataset, testing five different feature selection approaches. The approaches tested involved widely used approaches combined with novel approaches for feature selection.

You’ll discover the impact of the data splitting method, including relevant two-way and three-way interactions (xgbfir library), backwards stepwise feature selection as opposed to a singular feature selection step, and backwards stepwise feature selection using Shapley values (shap library).

About the speaker:
Ben Fowler is a Machine Learning Technical Leader at Southeast Toyota Finance, where he leads the end-to-end model development process. He’s been in the field of data science for over five years. Ben has spoken at the PyData Miami 2019 and PyData Los Angeles 2019 Conferences and will be speaking at the O'Reilly Strata Data and AI Conference in March 2020. Additionally, he has spoken multiple times at the West Palm Beach Data Science Meetup, given multiple talks to Southern Methodist University and was recently featured on the Datacast podcast. Ben holds a Master of Science in Data Science from Southern Methodist University.

Photo of PyData Miami / Machine Learning Meetup group
PyData Miami / Machine Learning Meetup
See more events