Nous sommes développeurs et chercheurs avec un intérêt dans l'apprentissage automatique. Nous nous retrouverons pour discuter concrètement nos projets dans l'apprentissage automatique, réseau de neurones artificiels, modèles graphiques probabilistes, et traitement automatique du langage naturel.
We're developers and scientists interested in Machine Learning, Probabilistic Graphical Models, Neural networks, and Natural Language Processing. In this meetup, we'll bring together machine learning practitioners and researchers to listen to each other's work.
Titre : Unified and unsupervised bilingual phrase alignment in specialized domain
Résumé : Significant advances have been achieved in bilingual word-level alignment, yet the challenge remains for phrase-level alignment. Moreover, the need for parallel data is a critical drawback for the alignment task. We propose a system that alleviates these two problems: a unified phrase representation model using cross-lingual word embeddings as input, and an unsupervised training algorithm inspired by recent works on neural machine translation. The system consists of a sequence-to-sequence architecture where a short sequence encoder constructs cross-lingual representations of phrases of any length, then an LSTM network decodes them w.r.t their contexts. We apply this framework on specialized domain corpora of modest size and obtained better results than state-of-the-art sequence encoders and alignment systems.
Bio : Jingshu Liu is data scientist at Easiware-Dictanova whilst simultaneously preparing his Ph.D. in natural language processing with Emmanuel Morin (LS2N, Université de Nantes). His research interests include NLP focusing on cross-lingual applications and sequence modeling with transfer learning using pre-trained language models. He is broadly interested in unsupervised cross-lingual and distributed system learning.
* Attention is All you Need (https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf)
* BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (https://arxiv.org/pdf/1810.04805.pdf)
* Towards a unified framework for bilingual terminology extraction of single-word and multi-word terms (https://www.aclweb.org/anthology/C18-1242.pdf)
* Learning Task-Dependent Distributed Representations by Backpropagation Through Structure (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.49.1968&rep=rep1&type=pdf)
* Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations (https://pdfs.semanticscholar.org/d901/93d2be26a9bf4b187763ee620dd4100d406a.pdf?_ga=2.268941686.627229774.1573205498-1770497098.1570614532)
* Unsupervised Neural Machine Translation (https://arxiv.org/pdf/1710.11041.pdf)
* Extract and Edit: An Alternative to Back-Translation for Unsupervised Neural Machine Translation (https://arxiv.org/pdf/1904.02331.pdf)