Skip to content

Talk by David Vilares: Constituent Parsing as Sequence Labeling

Y
Hosted By
Yova
Talk by David Vilares: Constituent Parsing as Sequence Labeling

Details

Abstract:
In this talk, I will introduce a method to reduce constituent parsing to sequence labeling. For each word w_t, it generates a label that encodes: (1) the number of ancestors in the tree that the words w_t and w_{t+1} have in common, and (2) the nonterminal symbol at the lowest common ancestor. The proposed encoding function is injective for any tree without unary branches. In practice, the approach is made extensible to all constituency trees by collapsing unary branches. I will also present a set of fast baselines and results on the PTB and CTB treebanks. These models outperform the Vinyals et al. (2015) sequence-to-sequence parser. In addition, sacrificing some accuracy, the approach achieves the fastest constituent parsing speeds reported to date on PTB by a wide margin.

Bio:
David is a Research Associate at the University of A Coruña (Spain) working in the FASTPARSE project (ERC Starting Grant). He currently develops techniques and algorithms to improve the speed of natural language parsers. In 2018, he was the recipient of one of the six National Computer Science Awards to Young Researchers by the BBVA Foundation and The Spanish Scientific Society for Computer Science. The topic of his thesis was compositional language processing for multilingual sentiment analysis, where he explored whether contextual and syntactic information are helpful for sentiment analysis. During his Ph.D. he was also a visiting scholar at the University of Wolverhampton, Aston University, and Nanyang Technological University. Other research interests of his involve areas such as computational social science and political analysis.

Photo of Natural Language Processing Copenhagen Meetup group
Natural Language Processing Copenhagen Meetup
See more events
Room 1-0-04
DIKU, Universitetsparken 1-3 · Copenhagen