[Reading] VideoPoet: LLM for Video Generation

![[Reading] VideoPoet: LLM for Video Generation](https://secure.meetupstatic.com/photos/event/a/9/6/f/highres_518323375.webp?w=750)
Details
This is one of the most exciting papers by the end of 2023. It demonstrates a new way of generating videos without using diffusion model.
Paper abstract:
VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs -- including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and task-specific adaptation. During pretraining, VideoPoet incorporates a mixture of multimodal generative objectives within an autoregressive Transformer framework. The pretrained LLM serves as a foundation that can be adapted for a range of video generation tasks. We present empirical results demonstrating the model's state-of-the-art capabilities in zero-shot video generation, specifically highlighting VideoPoet's ability to generate high-fidelity motions.
Presenter: Junling Hu
Paper link
Kondratyuk, Dan, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Rachel Hornung, Hartwig Adam et al. "VideoPoet: A Large Language Model for Zero-Shot Video Generation." arXiv:2312.14125 (Dec 21, 2023). https://arxiv.org/abs/2312.14125
Blog, Dec 19, 2023: [https://blog.research.google/2023/12/videopoet-large-language-model-for-zero.html](https://blog.research.google/2023/12/videopoet-large-language-model-for-zero.html)
Website: https://sites.research.google/videopoet/
This is part of the bi-weekly paper reading series.
Join this event here: https://us02web.zoom.us/meeting/register/tZwof-yhrz4qEtDZAUUe38BYY1pUhNmYcZVU
Agenda:
7pm-7:05pm Meet and Greet
7:05-7:50pm Presentation
7:50-8:00 pm Q&A and Discussions

[Reading] VideoPoet: LLM for Video Generation