Skip to content

Seminar: Matthias Gallé (NAVER Labs), The best atomic unit to represent text

Photo of Isabelle Augenstein
Hosted By
Isabelle A.
Seminar: Matthias Gallé (NAVER Labs), The best atomic unit to represent text

Details

Title: What is the best atomic unit to represent text?

Abstract: What is the best atomic unit to represent text? This important decision lies at the heart of the intersection between the continuous representation of modern NLP and the discrete world. To understand the effectiveness of BPE, we test the hypothesis that it lies in the compression capacity of that algorithm. We test this by linking it to the broader family of dictionary-based compression algorithms.
We then study character-based NMT with Transformer models, showing the consequences of using character as atomic symbols on overall translation quality, robustness as well as the need of deeper models.
This is joint work with Rohit Gupta, Laurent Besacier and Marc Dymetman.

Bio: Matthias Gallé (https://europe.naverlabs.com/people_user/Matthias-Galle/) is a Group Lead of the Natural Language Processing group at Naver Labs Europe (https://europe.naverlabs.com/people_user/Matthias-Galle/), which has the mission of teaching computers to understand and generate natural language. His background is in theoretical computer science & algorithmics, with applications to genetic and natural language sequences. In addition to his research interest in statistical and combinatorial methods for analysing text, he like applying them to explore data-sets and see what they tell us about the world.
He joined the centre (at what was at the time Xerox Research) in 2011. His PhD is from the INRIA centre in Rennes, France, and before that, he was at FaMAF, (National University of Córdoba, Argentina). He grew up in Germany and spent some years in Brazil.

Organisational information:
This meetup is part of the University of Copenhagen AI Centre Seminar Series (https://ai.ku.dk/events/). The talk will be from 13:00-13:45, followed by 15 minutes for questions and informal networking after the talk.

Photo of Natural Language Processing Copenhagen Meetup group
Natural Language Processing Copenhagen Meetup
See more events
August Krogh Building
Universitetsparken 13 · København