Front Range NLP (Natural Language Processing) Message Board › CS Colloquium 2013/4/4: "Improving the Accuracy, Efficiency and Data Use for

CS Colloquium 2013/4/4: "Improving the Accuracy, Efficiency and Data Use for Natural Language Parsing"

Lee B.
user 84248312
Group Organizer
Boulder, CO
Several people have asked me to forward this information:

The CS department at the University of Colorado holds colloquia on Thursdays during the Fall and Spring semesters. The colloquia are free and open to the public. The next one will be on Natural Language Processing and is detailed below.

For more information on the colloquia program, see­

ECCR 265 (campus map [HTML] | engineering center map [PDF])

Thursday, April 4
3:30-4:30 PM

"Improving the Accuracy, Efficiency and Data Use for Natural Language Parsing"
Shay Cohen
Columbia University

We are facing enormous growth in the amount of information available from various data resources. This growth is even more notable when it comes to text data; the number of pages on the internet, for example, is expected to double itself every five years, with billions of multilingual webpages already available.

In order to make use of this textual data in natural language understanding systems, we need to rely on text analysis that structures this information. Natural language parsing is one such example, a fundamental problem in NLP. It provides the basic structure to text, representing its syntax computationally. This structure is used in most NLP applications that analyze language to understand  its meaning.

I will discuss three important facets of modeling syntax: (a) accuracy of learning; (b) efficiency of parsing unseen sentences; and (c) selection of data to learn from. In this talk, the common theme of these three ideas is the concept of learning from incomplete data. To model syntax more effectively, I will first describe a model called latent-variable probabilistic context-free grammars (L-PCFGs) which, because of the hardness of learning from incomplete data, has until recently been used for learning in tandem with many heuristics and approximations. I will show a much more principled and statistically consistent approach to learning L-PCFGs using spectral algorithms, and will also show how L-PCFGs can parse unseen sentences much more efficiently through the use of tensor decomposition.

In addition, I will touch on work with unsupervised language learning, one of the holy grails of NLP, in the Bayesian setting. In this setting, priors are used to guide the learner, compensating for the lack of labeled data. I will survey novel priors that were developed for this setting, and mention how they can be used monolingually and multilingually.

Shay Cohen is a postdoctoral research scientist in the Department of Computer Science at Columbia University. He holds a CRA Computing Innovation Fellowship.  He received his B.Sc. and M.Sc. from Tel Aviv University in 2000 and 2004, and his Ph.D. from Carnegie Mellon University in 2011. His research interests span a range of topics in natural language processing and machine learning, with a focus on structured prediction. He is especially interested in developing efficient and scalable parsing algorithms as well as learning algorithms for probabilistic grammars.
Hosted by Mike Eisenberg.

Hosted by Christine Lv.

Free and open to all; light refreshments will be served.
Lee B.
user 84248312
Group Organizer
Boulder, CO
Post #: 2
It looks like this talk has been cancelled.
Powered by mvnForum

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy