Wat we doen
As many of you will know, this year's EMNLP (one of the major NLP conferences) takes place in Brussels. We have seized this opportunity to invite some international speakers to the meetup, and we're more than happy to announce a stellar lineup: Ines Montani & Matthew Honnibal (Explosion.AI), Sebastian Ruder (Aylien & Insight Research Centre), Manaal Faruqui (Google) and Arpit Mittal (Amazon). If that list of speakers doesn't make you want to join us, we don't know what will...
We'll meet on the evening of October 31st in the Small Auditorium of the KU Leuven campus in Brussels. We'll start at 6.30pm sharp.
- Understanding Structure in Language through Wikipedia Edits
Manaal Faruqui (Google)
Wikipedia editors are constantly making changes to the online content to add, update, or remove information. In this talk we explore the question of whether the way in which humans edit information on Wikipedia can give us supervision to learn about the structure of language and utilize such information in solving downstream NLP problems. We will pay particular emphasis on how such edits can provide signal to split and rephrase sentences and briefly discuss the importance of atomic edits to text.
- The importance of scaling down: One weird trick to make your NLP projects more successful
Matthew Honnibal, Explosion.AI
Commercial machine learning projects are like start-ups: many fail, but some are very successful. While some people will tell you to "embrace failure", I say failure sucks — so what can we do to fight it? I will discuss how to address some of the most likely causes of failure for new NLP projects. My main recommendation is to take an iterative approach: don't assume you know what your pipeline, annotation schemes or model architectures should look like. I will also discuss a few tips for figuring out what's likely to work, and a few common mistakes. I will refer specifically to our open-source library spaCy, and our commercial annotation tool Prodigy.
- Rapid NLP Annotation Through Binary Decisions, Pattern Bootstrapping and Active Learning
Ines Montani, Explosion.AI
In this talk, I'll present a fast, flexible and fun approach to named entity annotation. In this approach, a model can be trained for a new entity type in only a few hours, starting from unannotated text and a few seed terms. Given the seeds, we first perform an interactive lexical learning phase. Then the annotator is presented with candidate phrases, and the annotation is conducted as a binary choice. The responses are used train a statistical model and its predictions are mixed into the annotation queue. The pattern matcher and entity recognition model are available in our library spaCy, while the interface, task queue and workflow management are implemented in our annotation tool Prodigy.
- Large-scale Fact Extraction and Verification
Arpit Mittal (Amazon)
With billions of individual web pages with information on almost every topic, we should have the ability to collect facts to answer almost every question. However, only a small fraction of this information is contained in structured sources. We are therefore limited by our ability to transform free text to structured knowledge. There is, however, another problem that has become the focus of a lot of attention: false information from unreliable sources. In this talk I will discuss our work on fact extraction and verification. I will also present the new FEVER dataset we built for this task and discuss top entries from the FEVER challenge.
- Transfer learning with language models
Sebastian Ruder, Aylien & Insight Research Centre
In recent months, pretrained language models have been successfully used for transfer learning across many tasks in NLP. In this talk, I will give an overview of recent advances in this direction such as ELMo, ULMFiT, and the OpenAI Transformer. I will try to distill our current understanding of language models and highlight exciting future directions.