Exploring World Models: Video and 3D Generation and Understanding


Details
World Models, which understand the digital and physical worlds through the paradigm of predicting the future, have long been considered one of the key paths to achieving AGI. At this event, you will have the opportunity to engage with the ByteDance Visual Fundamental Research Team, as well as scientists from NTU and NUS. Explore the latest technological advancements together!
AGENDA
- 3:30-4:00 PM
Registration & Networking
- 4:00-4:10
PM Welcome & Introduction
Dr. Jiashi Feng, head of vision research at ByteDance
- 5:00-5:30 PM
Multi-Modal Generative AI with Foundation Models
Prof. Liu Ziwei, Assistant Professor, College of Computing and Data Science Nanyang Technological University
- 4:30-5:00 PM
Depth Anything: Foundation Models For Monocular Depth Estimation
Dr. Bingyi Kang, Research Scientist, ByteDance.
- 5:00-5:30 PM
Magic-Boost: Boosting 3D Generation with Multi-View Conditioned Diffusion
Dr. Jianfeng Zhang, Research Scientist, ByteDance
- 5:30-6:40 PM
Dinner + Networking
- 6:40-7:10 PM
Multimodal Video Understanding and Generation
Dr. Mike Shou Zheng, ASSISTANT PROFESSOR, NRF Fellow, National University of Singapore
- 7:10-7:40 PM
Continuous High-Dynamic Long Video Generation Solution
Dr. Daquan Zhou, Research Scientist, ByteDance
- 7:40-8:10 PM
InstaDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos
Dr. Jun Hao Liew., Research Scientist, ByteDance
- 8:10-8:30 PM
Photo Taking & Warping up

Exploring World Models: Video and 3D Generation and Understanding