Skip to content

Details

Register: Expedite your registration at Galvanize and register here (https://www.eventbrite.com/e/denver-r-user-group-dave-robison-on-tidytext-tickets-31525020184).

Note: The Room in Galvanize is LL1+LL2.

Time: Socialize from 6.30 - 7, talk from 7 - 7.45 or so, and head to a bar shortly after.

Title: Tidy Text Mining with R

Abstract: Text data is increasingly important in many domains, but it can be challenging to manipulate and visualize within typical R analysis workflows. In this talk, I will introduce the tidytext package and show how tidy data principles and tools can make text mining easier and more effective, by structuring text as one-token-per-row. You'll learn how to manipulate, summarize, and visualize text's characteristics using R packages from the tidy ecosystem such as dplyr, ggplot2, and tidyr. You'll see case studies of sentiment analysis, tf-idf, and topic modeling applied to examples from literature, Twitter, and Stack Overflow questions, and gain the tools to draw conclusions from your own text datasets.

Bio: David Robinson is a Data Scientist at Stack Overflow, where he analyzes data on the world's software developers to help them find answers to their programming questions. He is the co-author with Julia Silge of the tidytext package and of the upcoming book Text Mining with R, to be published by O'Reilly in 2017. He is also the author of the broom, gganimate, and fuzzyjoin packages and of the DataCamp course "Exploratory Data Analysis in R: Case Study." He writes about R, statistics and education on his blog Variance Explained, as well as on Twitter as @drob.

Members are also interested in