Skip to content

Details

Welcome to day two of the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.

Time and Location

Jan 15, 2026
9:00-11:00 AM Pacific
Online. Register for the Zoom!

Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training

Diffusion models have achieved impressive results across many generative tasks, yet the mechanisms that prevent memorization and enable generalization remain unclear. In this talk, I will focus on how training dynamics shape the transition from generalization to memorization. Our experiments and theory reveal two key timescales: an early time when high-quality generation emerges and a later one when memorization begins. Notably, the memorization timescale grows linearly with the size of the training set, while the generalization timescale stays constant, creating an increasingly wide window where models generalize well. These results highlight an implicit dynamical regularization that helps diffusion models avoid memorization even in highly overparameterized regimes.

About the Author

Raphaël Urfin is a PhD student at École Normale Supérieure – PSL in Paris, supervised by Giulio Biroli (ENS) and Marc Mézard (Bocconi University). His work focuses on applying ideas and tools of statistical physics to better understand diffusion models and their generalization properties.

Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring

Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are changing. Indeed, some 90% Earth’s species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training – the problem of open-set recognition (OSR) – limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide timely insights to guide the development of computer vision methods for biodiversity monitoring and species discovery.

About the Speaker

Yuyan Chen is a PhD student in Computer Science at McGill University and Mila - Quebec AI Institute, supervised by Prof. David Rolnick. My research focuses on machine learning for biodiversity monitoring.

GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer

Transferring appearance to 3D assets using different representations of the appearance object - such as images or text - has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results.

Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively.

We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions.

About the Speaker

Sayan Deb Sarkar is a 2nd-year PhD student at Stanford University in the Gradient Spaces Group, advised by Prof. Iro Armeni, part of the Stanford Vision Lab (SVL). His research interests are on multimodal 3D scene understanding and interactive editing. Past summer, he interned with the Microsoft Spatial AI Lab, hosted by Prof. Marc Pollefeys, working on efficient video understanding in spatial context. Before starting PhD, he was a CS master student at ETH Zürich, in the Computer Vision and Geometry Group (CVG), working on aligning real-world 3D environments from multi-modal data. In the past, he has been a Research Intern at Qualcomm XR labs, Computer Vision Engineer at Mercedes Benz R & D and Research Engineer at ICG, TU Graz. Website: https://sayands.github.io/

HouseLayout3D: A Benchmark and Baseline Method for 3D Layout Estimation in the Wild​ ​

Current 3D layout estimation models are primarily trained on synthetic datasets containing simple single room or single floor environments. As a consequence, they cannot natively handle large multi floor buildings and require scenes to be split into individual floors before processing, which removes global spatial context that is essential for reasoning about structures such as staircases that connect multiple levels. In this work, we introduce HouseLayout3D, a real world benchmark designed to support progress toward full building scale layout estimation, including multiple floors and architecturally intricate spaces. We also present MultiFloor3D, a simple training free baseline that leverages recent scene understanding methods and already outperforms existing 3D layout estimation models on both our benchmark and prior datasets, highlighting the need for further research in this direction.

About the Speaker

Valentin Bieri is a Machine Learning Engineer and Researcher specializing in the intersection of 3D Computer Vision and Natural Language Processing. Building on his applied research in SLAM and Vision-Language Models at ETH Zurich, he now develops AI agents for manufacturing at EthonAI.

Artificial Intelligence
Computer Vision
Machine Learning
Open Source

Sponsors

Sponsor logo
Voxel51
Administration, promotion, giveaways and charitable contributions.

Members are also interested in