Skip to content

Hugo Laurençon | What Matters When Building Vision-Language Models?

Network event
158 attendees from 4 groups hosting
Photo of Martin Goodson
Hosted By
Martin G.
Hugo Laurençon | What Matters When Building Vision-Language Models?

Details

Virtual London Machine Learning Meetup.
Title: What matters when building vision-language models?
Speaker: Hugo Laurençon (Meta AI)
Paper: https://arxiv.org/abs/2405.02246
Abstract: The growing interest in vision-language models (VLMs) has been driven by improvements in large language models and vision transformers. Despite the abundance of literature on this subject, we observe that critical decisions regarding the design of VLMs are often not justified. We argue that these unsupported decisions impede progress in the field by making it difficult to identify which choices improve model performance. To address this issue, we conduct extensive experiments around pre-trained models, architecture choice, data, and training methods. Our consolidation of findings includes the development of the models Idefics2 and Idefics3.
Bio: I spent 3 years at Hugging Face while I was doing my PhD, and my research focused on developing vision-language models and creating datasets for their training. Now I've just started a new role as an AI Research Scientist at Meta.

Agenda:

  • 18:25: Virtual doors open
  • 18:30: Talk
  • 19:10: Q&A session
  • 19:30: Close

Sponsor: Evolution AI - Generative AI-powered data extraction from financial documents.

Photo of London Machine Learning Meetup group
London Machine Learning Meetup
See more events