Skip to content

Jing Yu Koh | Grounding Language Models to Images for Multimodal Generation

Photo of Martin Goodson
Hosted By
Martin G.
Jing Yu Koh | Grounding Language Models to Images for Multimodal Generation

Details

Virtual London Machine Learning Meetup. 22.03.2023 @ 18:30 (BST)

Title: Grounding Language Models to Images for Multimodal Generation

Speaker: Jing Yu Koh, PhD student at Carnegie Mellon University
Paper: https://arxiv.org/abs/2301.13823
Abstract: Can we leverage the abilities of text-only language models for processing and generating interleaved image-and-text data? In this talk, I present an efficient approach for adapting pretrained language models to multimodal tasks. By keeping the language model frozen, and finetuning input and output linear layers for cross-modality interaction, we are able to leverage the abilities of language models learnt from large scale pretraining, such as in-context learning and free-form text generation. Experimental results and qualitative examples show the capabilities of our model for generating compelling multimodal discourse, as well as several zero and few-shot abilities. Our approach works with any off-the-shelf language model and paves the way towards an effective, general solution for leveraging pretrained language models in visually grounded settings.

Bio: Jing Yu Koh is a 1st year machine learning PhD student at Carnegie Mellon University, where he is advised by Daniel Fried and Ruslan Salakhutdinov. He works on grounded language understanding, and his research aims to build machine learning models which can fuse different modalities (text, images, videos, and more) to achieve strong performance on complex reasoning and generation tasks. Prior to joining CMU, he was a research engineer at Google Research, where he worked on text-to-image generation and multimodal learning.

Agenda:

  • 18:25: Virtual doors open
  • 18:30: Talk
  • 19:10: Q&A session
  • 19:30: Close

Sponsor: Evolution AI - Intelligent data extraction from corporate and financial documents.

Photo of London Machine Learning Meetup group
London Machine Learning Meetup
See more events