Tommy Jones (, Research Associate Statistician at the Institute for Defense Analyses - Science and Technology Policy Institute (, will provide an introduction to topic modeling ( using latent Dirichlet allocation ( (LDA).

Title: An overview of Latent Dirichlet Allocation and probabilistic topic modeling


Topic models are a family of models to estimate the distribution of abstract concepts (topics) that make up a collection of documents. Over the last several years, the popularity of topic modeling has swelled. One model, Latent Dirichlet Allocation (LDA), is especially popular.

Tommy Jones will describe a range of topic modeling algorithms and how they fit into the topic modeling taxonomy. He will then focus on LDA, explaining how to tune its parameters and giving tips for building better LDA models.

Finally, Tommy will present several open statistical questions in topic modeling, particularly LDA. Examples include LDA's inconsistency, how sample selection affects estimates, and how to best present results. Researchers have begun to tackle some of these issues, but others remain. Still, LDA and other topic models are becoming invaluable resources for researchers in many disciplines.


We'll meet at 6:30pm in the upstairs bar at Stetsons near the intersection of 16th and U Streets NW in Adams Morgan. Introductions & announcements will start around 7:00pm, and presentations will begin at 7:30pm. Afterwards, there will be plenty of time for follow-up questions, networking, and drinks.

Please note: Stetsons is a 21 and over venue.

DC NLP meets on the second Wednesday of each month to network, socialize, and learn about the interesting work folks are doing in natural language processing, computational linguistics, text analytics, and more.

Do you have something you'd like to share with the group? Let us know! We're always looking for speakers to give talks at future meetups, and don't forget to follow @DCNLP ( on Twitter!