Skip to content

Unified and unsupervised bilingual phrase alignment in specialized domain

Photo of Jeff Abrahamson
Hosted By
Jeff A.
Unified and unsupervised bilingual phrase alignment in specialized domain

Details

Titre : Unified and unsupervised bilingual phrase alignment in specialized domain

Résumé : Significant advances have been achieved in bilingual word-level alignment, yet the challenge remains for phrase-level alignment. Moreover, the need for parallel data is a critical drawback for the alignment task. We propose a system that alleviates these two problems: a unified phrase representation model using cross-lingual word embeddings as input, and an unsupervised training algorithm inspired by recent works on neural machine translation. The system consists of a sequence-to-sequence architecture where a short sequence encoder constructs cross-lingual representations of phrases of any length, then an LSTM network decodes them w.r.t their contexts. We apply this framework on specialized domain corpora of modest size and obtained better results than state-of-the-art sequence encoders and alignment systems.

Bio : Jingshu Liu is data scientist at Easiware-Dictanova whilst simultaneously preparing his Ph.D. in natural language processing with Emmanuel Morin (LS2N, Université de Nantes). His research interests include NLP focusing on cross-lingual applications and sequence modeling with transfer learning using pre-trained language models. He is broadly interested in unsupervised cross-lingual and distributed system learning.

Lecture :

Photo of Nantes Machine Learning Meetup group
Nantes Machine Learning Meetup
See more events
Epitech
18 rue Flandres Dunkerque · Nantes