Bristol Machine Learning #14 - Online Talk

Details
We're back!
With the current ongoing situation it seems unlikely that we're going to be able to do any in person meet-ups anytime soon. In the meantime we've decided to move to doing shorter online meet-ups via zoom.
These are going to be every couple of weeks on Friday lunchtimes (12.30pm till 1.30pm).
We're a non-profit organisation run by volunteers. None of this would be possible without the support of our sponsors, Bristol based companies working on cutting edge ML solutions: Cookpad, Candide and Graphcore.
As always we need speakers, please don't hesitate to contact us if you would like to present, or know someone else who does. Talks can be between 10 to 45 mins long, and you don't have to be a seasoned speaker to get involved. Please don't hesitate to get in touch if you want to get involved.
Our first speaker is going to be Misha Fain. Misha has been a Machine learning researcher at Cookpad since 2018.
His talk is titled: "Backretrieval: An Image-Pivoted Evaluation Metric for Cross-Lingual Text Representations Without Parallel Corpora"
Here's a summary of the talk:
"Many companies work on machine learning problems that falls under the broad categorisation of "cross-lingual information retrieval." To break this term down, cross-lingual means that an ML model is expected to work simultaneously on several languages (i.e. one model for all languages), and information retrieval is a task that tries to return relevant data on the basis of a user-specified query (web search being a canonical example).
The standard approach for cross-lingual models is to have access to "parallel-corpora" where a set of documents are translated to multiple languages. This is a limiting requirement since most datasets are not parallel, and aligning datasets requires significant manual effort and linguistic expertise. This is an expensive and slow process that is susceptible to noise and human error.
This talk will introduce an intuitive metric for unsupervised evaluation of unaligned datasets. Introducing this metric opens up the opportunity to incorporate much more data, but it also makes evaluation and determination of the best models fast, cheap and consistent. We call it Backretrieval, and demonstrate its performance on parallel corpora as well as data collected at Cookpad's recipe dataset.
We have published this work and it will be presented at ACM's International Conference on Information Retrieval (SIGIR, '21) in July."
Here are the details of the zoom call:
https://cookpad.zoom.us/j/99382299008?pwd=aEg1ejRRaUZ6RkRWMnh6dkZidjdBQT09
Passcode: 761685
Look forward to seeing you there!

Bristol Machine Learning #14 - Online Talk