Analysing texts with R (and writing a package to do so)


For our last meetup of the year we have Adam Obeng talking text analysis and writing R packages.

About the Talk:

quanteda ( is a package for the quantitative analysis of textual data, with a particular focus on methods used in political science. When I started contributing to quanteda six months ago, I had never written an R package before, I didn't know anything about text analysis in political science (some might say I still don't).

Join me for an exploration of the features of quanteda 1.0, which creates text corpora, and extracts and analyses their features. It implements natural language processing functionality, scales document positions, creates topic models and does correspondence analysis. It slices! It dices! It can quantify how polarised your politicians happen to be!

Along the way, I'll also share some of the unexpected challenges of developing an R package, from git workflow, to best practices in naming things (the Second Hard Problem of Computer Science), to handling really, really obscure text encodings.

About Adam:

Adam Obeng ( is a sociologist of science and a computational social scientist, which is kind of like a data scientist except he hasn't quite graduated yet (PhD Sociology, Columbia (expected 2017)). He learnt R from our own Jared Lander [Adam wrote this bio] and since then has used his powers to help Microsoft to conduct polls and Twitter to fight trolls. In his spare time, Adam likes lifting things, climbing up things, and programming really silly ( things (

Pizza ( begins at 6:30, the talks start at 7, then after we head to the local bar.

We will do our best to livestream ( the meetup again so checkout YouTube Live (

Thank you to eBay NYC ( for hosting us again.

Thank you to our new sponsor Slice (, helping us bring pizza to statistical programming.