Paris NLP Season 3 Meetup #6

Cet événement est passé

342 y sont allés

Tous les 4 mercredi du mois

Image du lieu de l'événement


Seating is on a first come, first served basis whether you have RSVPed or not, so we suggest arriving early. Scaleway can host at maximum 100 people.

La salle permet d'accueillir 100 personnes au maximum. L'inscription est obligatoire mais ne garantit pas que vous pourrez entrer, nous vous recommandons donc d'arriver un peu en avance.

Nous vous informons que l’évènement sera filmé par Scaleway puis diffusé sur YouTube. Cet enregistrement vidéo sera supprimé des systèmes une fois sa publication sur YouTube.

Slides will be shared after the meetup.


- [Talk in English] Olga Petrova, Machine Learning DevOps Engineer at Scaleway

Subject: Understanding text with BERT


Reading comprehension is one of the fundamental human skills that, however, presents a highly non trivial problem for a machine learning system. One of the ways to begin tackling it is to cast it in the form of question answering based on a given text. In this talk we shall look at how we can approach this task using the latest advance in deep learning for NLP: the Transformer architecture, which has come to replace RNN based models for many NLP tasks. In particular, we will go through an example of training a model based on BERT, a pre-trained encoder/transformer network, on SQuAD (the Stanford Question Answering Dataset).


- [Talk in French] Axel de Romblay. Machine Learning Engineer at Dailymotion.

How to build a multi-lingual text classifier ?

In this talk, we will introduce one of the biggest challenge we face at dailymotion : how do we accurately categorize our video catalog at scale using the descriptions ?
The purpose is to introduce the whole pipeline running at dailymotion which relies on a complex mixing of different methods : machine learning for language detection, NEL to Wikidata knowledge graph, deep learning using sparse representations and NLP with multi-lingual embeddings & robust transfer learning.

Reference :


- [Talk in English] Arthur Darcet, Mehdi Hamoumi & Marc Benzahra, Glose,

Text complexity is mainly described by three factors:
* Readability, text content described such as vocabulary, syntax, discourse.
* Legibility, text form such as character size, font and formatting such as emphasis.
* Reader-dependent features such as reading ability and reading context such as environment (noisy, calm, classroom, subway) or intent (educational, recreational).

At Glose, we built a product where readers can discover, read, and annotate thousands of e-books while being able to share with their friends. It is currently used by thousands of readers worldwide, especially in the academic field where collaborative reading is a great feature for professors/teachers and their students.
In order to improve reading experience, we are currently working on automatic text readability evaluation to enhance book recommendation, which should ease the learning curve of a reader.
We tackle this NLP task with both supervised and unsupervised machine learning approaches.

During this talk, we will present our supervised pipeline [1] which encodes a book's content into a set of features and consumes it to fit model parameters that are able to predict a readability score.
Then, we will introduce an unsupervised approach to this task [2] based on the following hypothesis: the simpler a text is, the better it should be understood by a machine. It consists in correlating the ability of multiple language models (LMs) at infilling Cloze tests with readability level labels.