Lunch & Talk "Crosslingual Document Embedding as Reduced-Rank Ridge Regression"


Details
The Swisscom Digital Lab is happy to welcome Robert West for this new Talk.
NEW Agenda - June 19th :
12 PM to 1 PM : Talk
1PM to 2 PM: Pizza Lunch
Topic:
In this talk, Robert will introduce Cr5 (Crosslingual reduced-rank ridge regression), a method for embedding documents written in any language into a single, language-independent vector space, such that documents can be seamlessly compared across languages. For training, their approach leverages a multilingual corpus where the same concept is covered in multiple languages, such as Wikipedia. As opposed to most prior methods, which use pre-trained monolingual word vectors, post-process them to make them crosslingual, and finally average word vectors to obtain document vectors, Cr5 is trained end-to-end and is thus natively crosslingual as well as document-level. Moreover, since their algorithm uses the singular value decomposition as its core operation, it is highly scalable. Experiments show that Cr5 achieves state-of-the-art performance on a crosslingual document retrieval task.
Bio:
Bob is an assistant professor of Computer Science at EPFL, where he heads the Data Science Lab. His research aims to understand, predict, and enhance human behavior in social and information networks by developing techniques in data science, data mining, network analysis, machine learning, and natural language processing. He holds a PhD in computer science from Stanford University.

Lunch & Talk "Crosslingual Document Embedding as Reduced-Rank Ridge Regression"