Paris NLP Season 3 Meetup #6

Are you going?

28 spots left

Share:

Every 4th Wednesday of the month

Location image of event venue

Details

Seating is on a first come, first served basis whether you have RSVPed or not, so we suggest arriving early. Scaleway can host at maximum 100 people.

La salle permet d'accueillir 100 personnes au maximum. L'inscription est obligatoire mais ne garantit pas que vous pourrez entrer, nous vous recommandons donc d'arriver un peu en avance.

Nous vous informons que l’évènement sera filmé par Scaleway puis diffusé sur YouTube. Cet enregistrement vidéo sera supprimé des systèmes une fois sa publication sur YouTube.

----------

Speakers:

- [Talk in English] Olga Petrova, Machine Learning DevOps Engineer at Scaleway

Subject: Understanding text with BERT

Outline:
1) Self-attention and transformer architecture
2) What is BERT
3) Fine-tuning a pre-trained BERT for the question answering task using open source implementations (It would be this one: https://github.com/huggingface/pytorch-pretrained-BERT)

----------

- [Talk in French] Axel de Romblay. Machine Learning Engineer at Dailymotion.

How to build a multi-lingual text classifier ?

In this talk, we will introduce one of the biggest challenge we face at dailymotion : how do we accurately categorize our video catalog at scale using the descriptions ?
The purpose is to introduce the whole pipeline running at dailymotion which relies on a complex mixing of different methods : machine learning for language detection, NEL to Wikidata knowledge graph, deep learning using sparse representations and NLP with multi-lingual embeddings & robust transfer learning.

Reference : https://medium.com/dailymotion/topic-annotation-automatic-algorithms-data-377079d27936

----------

- [Talk in English] Arthur Darcet, Mehdi Hamoumi & Marc Benzahra, Glose, https://glose.com/what-is-glose

Text complexity is mainly described by three factors:
* Readability, text content described such as vocabulary, syntax, discourse.
* Legibility, text form such as character size, font and formatting such as emphasis.
* Reader-dependent features such as reading ability and reading context such as environment (noisy, calm, classroom, subway) or intent (educational, recreational).

At Glose, we built a product where readers can discover, read, and annotate thousands of e-books while being able to share with their friends. It is currently used by thousands of readers worldwide, especially in the academic field where collaborative reading is a great feature for professors/teachers and their students.
In order to improve reading experience, we are currently working on automatic text readability evaluation to enhance book recommendation, which should ease the learning curve of a reader.
We tackle this NLP task with both supervised and unsupervised machine learning approaches.

During this talk, we will present our supervised pipeline [1] which encodes a book's content into a set of features and consumes it to fit model parameters that are able to predict a readability score.
Then, we will introduce an unsupervised approach to this task [2] based on the following hypothesis: the simpler a text is, the better it should be understood by a machine. It consists in correlating the ability of multiple language models (LMs) at infilling Cloze tests with readability level labels.

References:
[1] https://medium.com/glose-team/how-to-evaluate-text-readability-with-nlp-9c04bd3f46a2
[2] https://storage.cloud.google.com/s5-bucket/research/marc/acl_bea_paper.pdf