Text analysis with R: Topic modeling
Details
Topic modeling classifies a collection of documents (those can be articles, books, song lyrics, social media posts...) into natural groups, which we can then analyse to help us better understand the text. This is an unsupervised classification, similar to clustering for numeric data. It operates on the principle that every document is a mixture of different topics, and every topic is a micture of words.
In this workshop, we'll start by demonstrating the basics of topic modeling with newspaper articles, then put the method to the test by mixing up chapters from three different books and checking if topic modeling can correctly sort the chapters back into the books they appear in.
We'll finish up with an exercise on clustering song lyrics by the Spice Girls, Beyoncé, and Taylor Swift.