This month we have Max Sklar from Foursquare presenting "Digging into the Dirichlet Distribution"
When it comes to recommendation systems and natural language processing, data that can be modeled as a multinomial or as a vector of counts is ubiquitous.
For example if there are 2 possible user-generated ratings (like and dislike), then each each item is represented as a vector of 2 counts. In a higher dimensional case, each document may be expressed as a count of words, and the vector size is large enough to encompass all the important words in that corpus of documents.
The Dirichlet distribution is one of the basic probability distributions for describing this type of data. The Dirichlet distribution is surprisingly expressive on its own, but it can also be used as a building block for even more powerful and deep models such as mixtures and topic models.
In this talk, we're going to take a closer look at the Dirichlet distribution and it's properties, as well as some of the ways it can be computed efficiently.
The following topics are open to discussion:
- How can we think about the Dirichlet distribution so that it matches our intuition rather than just a formula?
- How can we describe the Dirichlet distribution to people outside the field of statistics and machine learning?
- What is Polya's Urn and how does it relate to the Dirichlet distribution?- How is the Dirichlet useful as a conjugate prior?
- If the Dirichlet is the conjugate prior for the multinomial distribution - is there a conjugate prior for the Dirichlet distribution?
- How can we quickly compute the MLE Dirichlet Distribution from a set of data? (We'll look at Thomas Minnka's fastfit library, as well as my own open source implementation in python). http://research.microsoft.com/en-us/um/people/minka/software/fastfit/
- What are some real-world data sets that can be modeled as a Dirichlet?- How do topic models use the Dirichlet as a building block?- What about the infinite dimensional case?
Max Sklar is an engineer and a machine learning specialist. At Foursquare, his continuing objective is to make the app smarter and more interesting. Over the last two years, Max has spearheaded the effort to apply Natural Language Processing technology to Foursquare’s user-generated text corpus. He has spoken at a variety of conferences and meetups in New York’s tech scene, and has been an adjunct instructor for NYU’s data structures course for four semesters. He holds an M.S. in Information Systems from NYU, and a B.S. in Computer Science from Yale, and can be found on Twitter @maxsklar.