Vision and Language models: How much do they use the image and text modality?


Details
Full Title: About Vision and Language models: What grounded linguistic phenomena do they understand? How much do they use the image and text modality?
Multimodal models are making headlines, with models like ChatGPT now being able to interpret images.
We are excited to have Letitia Parcalabescu, a PhD student at Heidelberg University who has already worked on projects with Aleph Alpha and is also a machine learning Youtuber, speaking at the DKFZ. During her talk, she will illuminate the methodologies to evaluate language vision models for fine-grained linguistic tasks and also how to explain their outputs to make them safe for human interaction.
We hope to see you there, learning with us about future multimodal models.
For more info go to:
https://heidelberg.ai/2023/11/28/parcalabescu.html
After the event, we will also upload a video recording to our YouTube channel:
https://www.youtube.com/channel/UCfHWBneOsb7SfOxJepnMQKA
COVID-19-Sicherheitsmaßnahmen

Vision and Language models: How much do they use the image and text modality?