Skip to content

Lunch & Talk "Crosslingual Document Embedding as Reduced-Rank Ridge Regression"

Photo of Swisscom Digital Lab
Hosted By
Swisscom Digital L.
Lunch & Talk "Crosslingual Document Embedding as Reduced-Rank Ridge Regression"

Details

The Swisscom Digital Lab is happy to welcome Robert West for this new Talk.

NEW Agenda - June 19th :
12 PM to 1 PM : Talk
1PM to 2 PM: Pizza Lunch

Topic:
In this talk, Robert will introduce Cr5 (Crosslingual reduced-rank ridge regression), a method for embedding documents written in any language into a single, language-independent vector space, such that documents can be seamlessly compared across languages. For training, their approach leverages a multilingual corpus where the same concept is covered in multiple languages, such as Wikipedia. As opposed to most prior methods, which use pre-trained monolingual word vectors, post-process them to make them crosslingual, and finally average word vectors to obtain document vectors, Cr5 is trained end-to-end and is thus natively crosslingual as well as document-level. Moreover, since their algorithm uses the singular value decomposition as its core operation, it is highly scalable. Experiments show that Cr5 achieves state-of-the-art performance on a crosslingual document retrieval task.

Bio:
Bob is an assistant professor of Computer Science at EPFL, where he heads the Data Science Lab. His research aims to understand, predict, and enhance human behavior in social and information networks by developing techniques in data science, data mining, network analysis, machine learning, and natural language processing. He holds a PhD in computer science from Stanford University.

https://dlab.epfl.ch

Photo of Swisscom Digital Lab group
Swisscom Digital Lab
See more events
EPFL Innovation Park
Building F, 3rd Floor · Lausanne