Skip to content

[Reading] VideoPoet: LLM for Video Generation

Photo of Junling Hu
Hosted By
Junling H.
[Reading] VideoPoet: LLM for Video Generation

Details

This is one of the most exciting papers by the end of 2023. It demonstrates a new way of generating videos without using diffusion model.

Paper abstract:
VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs -- including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and task-specific adaptation. During pretraining, VideoPoet incorporates a mixture of multimodal generative objectives within an autoregressive Transformer framework. The pretrained LLM serves as a foundation that can be adapted for a range of video generation tasks. We present empirical results demonstrating the model's state-of-the-art capabilities in zero-shot video generation, specifically highlighting VideoPoet's ability to generate high-fidelity motions.

Presenter: Junling Hu

Paper link
Kondratyuk, Dan, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Rachel Hornung, Hartwig Adam et al. "VideoPoet: A Large Language Model for Zero-Shot Video Generation." arXiv:2312.14125 (Dec 21, 2023). https://arxiv.org/abs/2312.14125
Blog, Dec 19, 2023: [https://blog.research.google/2023/12/videopoet-large-language-model-for-zero.html](https://blog.research.google/2023/12/videopoet-large-language-model-for-zero.html)
Website: https://sites.research.google/videopoet/

This is part of the bi-weekly paper reading series.

Join this event here: https://us02web.zoom.us/meeting/register/tZwof-yhrz4qEtDZAUUe38BYY1pUhNmYcZVU

Agenda:
7pm-7:05pm Meet and Greet
7:05-7:50pm Presentation
7:50-8:00 pm Q&A and Discussions

Photo of AI Frontiers Forum group
AI Frontiers Forum
See more events
Online event
This event has passed