Skip to content

Details

By: Dennis Chandler

Text Mining has become a treasure trove of data with current Natural Language Processing (NLP) techniques and the hardware to runt them. These techniques provide semantics, topics, and other features for complex analysis in diverse areas including image-tagging, market-segmentation, and dynamic word-embeddings. There are several R packages such as tm, topicmodels, Quanteda, SnowballC, and many others that perform NLP tasks, but have their own specific quirks and formats. The tidytext package, by Julia Silge and David Robinson ‘provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages.’

This presentation will be a high-level review of the package and how it integrates with various other packages to simplify NLP workflows. I will work through several simple examples of the key functions and then work through an extended example of using the functions, in combination with some other packages, to create word-embeddings similar to Word2Vec and GloVe without using neural networks, but by just counting words and some linear algebra.

The meetup will be held in the CIC building at 20 South Sarah Street, St. Louis, MO 63108. We will be in the Showroom. The building entrance is at the corner of Forest Park Avenue and Sarah Street. You can find directions at http://stl.cic.us/directions/ (CIC@CET).

We will meet for snacks, set-up, and conversation at 6:00PM. The presentation will start at 6:30PM and will be about 60 minutes long.

Related topics

You may also like