Skip to content

Rephil: Extracting Concepts from Text

Photo of Rob Zinkov
Hosted By
Rob Z.
Rephil: Extracting Concepts from Text

Details

This talk will describe Rephil, a system used throughout Google to identify the concepts or topics that underlie a given piece of text. Rephil determines, for example, that "apple pie" falls under some of the same topics as "chocolate cake", but has little in common with "apple ipod". The concepts used by Rephil are not pre-specified; instead, they are derived by an unsupervised learning algorithm running on massive amounts of text. The result of this learning process is a Rephil model -- a giant Bayesian network with concepts as nodes. I will discuss the structure of Rephil models, the distributed machine learning algorithm that we use to build these models from terabytes of data, and the Bayesian network inference algorithm that we use to identify concepts in new texts under tight time constraints. I will also discuss how Rephil relates to ongoing academic research on probabilistic topic models.

Photo of LA Machine Learning group
LA Machine Learning
See more events
Shopzilla Inc
12200 West Olympic Blvd # 300 · Los Angeles, CA