Deep Learning: Stable Diffusion 3 and ICLR review

Name: Deep Learning: Stable Diffusion 3 and ICLR review
Start: 2024-05-15T18:30:00+02:00
End: 2024-05-15T21:30:00+02:00
Location: WU

Hosted By

Tom L.

Deep Learning: Stable Diffusion 3 and ICLR review

Details

Hello Deep Learners,

We kindly invite you to our next Deep Learning meetup on May 15th at WU Wien. Our main talk is about Stable Diffusion 3 given by Rahim Entezari from Stability AI. We will also have a review of the best papers of ICLR conference.

***

Agenda

18:30

Introduction & Welcome

18:45

Talk 1: Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Rahim Entezari, Stability.ai

19:40

Announcements: Events & Job Openings
Networking Break & Discussions

20:15

Talk 2: Best of ICLR conference
Networking & Discussions

21:30 Wrap up & End

***

Talk Details:

Talk 1: Stable Diffusion 3: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

by Rahim Entezari, Stability.ai

Diffusion models create data from noise by inverting the forward paths of data towards noise and have emerged as a powerful generative modeling technique for high-dimensional, perceptual data such as images and videos. Rectified flow is a recent generative model formulation that connects data and noise in a straight line. Despite its better theoretical properties and conceptual simplicity, it is not yet decisively established as standard practice. In this work, we improve existing noise sampling techniques for training rectified flow models by biasing them towards perceptually relevant scales. Through a large-scale study, we demonstrate the superior performance of this approach compared to established diffusion formulations for high-resolution text-to-image synthesis. Additionally, we present a novel transformer-based architecture for text-to-image generation that uses separate weights for the two modalities and enables a bidirectional flow of information between image and text tokens, improving text comprehension, typography, and human preference ratings. We demonstrate that this architecture follows predictable scaling trends and correlates lower validation loss to improved text-to-image synthesis as measured by various metrics and human evaluations. Our largest models outperform state-of-the-art models.

Author bio: Rahim Entezari is currently a research scientist at Stability.ai, working on improving the capabilities of text to image and text to video models. Before that Rahim did his PhD at TU Graz under supervision of Prof. Olga Saukh. He is mostly interested in improving generative models through the lens of loss landscape.

Talk 2: Best of ICLR conference

by Akshey Kumar, Charlie Fieseler, Michael Pieler and Rahim Entezari

The International Conference on Learning Representations (ICLR) is one of the top conferences in the field of AI, and it is taking place in Vienna from May 7 - 11. We'll have 4 persons who attend the conference present us a recap of the conference and a "best of" the latest papers presented at this conference.

We are looking forward to seeing you at our meetup!

Please note there is no food or drinks offered at this meetup.

Artificial Intelligence Applications Deep Learning

Artificial Intelligence Machine Learning Neural Networks