[ONLINE] Looking at Stop Words: Why You Shouldn't Blindly Trust Model Defaults
Our September Meetup will be a remote gathering. The Zoom link will be posted the week of the event.
Removing stop words is a fairly common step in natural language processing, and NLP packages often supply a default list. However, most documentation and tutorials don't explore the nuances of selecting an appropriate list. Defaults for machine learning and modeling can be helpful but may be misleading or wrong. This talk will focus on the importance of checking assumptions and defaults in the software you use.
Emil Hvitfeldt is a research programmer at University of Southern California, a co-organizer of East Los Angeles R Users Group, and the author of Supervised Machine Learning for Text Analysis in R with his coauthor Julia Silge. His interests include developing tools for natural language processing for machine learning models and the use of colors in data visualizations.