Skip to content

What we’re about

MLconf is a single day, single track conference, focusing on the current application of machine learning algorithms, techniques, tools & methods. 

Upcoming events

9

See all
  • Network event
    Jan 15 - Best of NeurIPS (Day 2)

    Jan 15 - Best of NeurIPS (Day 2)

    ·
    Online
    Online
    195 attendees from 47 groups

    Welcome to day two of the Best of NeurIPS series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.

    Time and Location

    Jan 15, 2026
    9:00-11:00 AM Pacific
    Online.
    Register for the Zoom!

    Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training

    Diffusion models have achieved impressive results across many generative tasks, yet the mechanisms that prevent memorization and enable generalization remain unclear. In this talk, I will focus on how training dynamics shape the transition from generalization to memorization. Our experiments and theory reveal two key timescales: an early time when high-quality generation emerges and a later one when memorization begins. Notably, the memorization timescale grows linearly with the size of the training set, while the generalization timescale stays constant, creating an increasingly wide window where models generalize well. These results highlight an implicit dynamical regularization that helps diffusion models avoid memorization even in highly overparameterized regimes.

    About the Author

    Raphaël Urfin is a PhD student at École Normale Supérieure – PSL in Paris, supervised by Giulio Biroli (ENS) and Marc Mézard (Bocconi University). His work focuses on applying ideas and tools of statistical physics to better understand diffusion models and their generalization properties.

    Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring

    Global biodiversity is declining at an unprecedented rate, yet little information is known about most species and how their populations are changing. Indeed, some 90% Earth’s species are estimated to be completely unknown. Machine learning has recently emerged as a promising tool to facilitate long-term, large-scale biodiversity monitoring, including algorithms for fine-grained classification of species from images. However, such algorithms typically are not designed to detect examples from categories unseen during training – the problem of open-set recognition (OSR) – limiting their applicability for highly diverse, poorly studied taxa such as insects. To address this gap, we introduce Open-Insect, a large-scale, fine-grained dataset to evaluate unknown species detection across different geographic regions with varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding that simple post-hoc approaches remain a strong baseline. We also demonstrate how to leverage auxiliary data to improve species discovery in regions with limited data. Our results provide timely insights to guide the development of computer vision methods for biodiversity monitoring and species discovery.

    About the Speaker

    Yuyan Chen is a PhD student in Computer Science at McGill University and Mila - Quebec AI Institute, supervised by Prof. David Rolnick. My research focuses on machine learning for biodiversity monitoring.

    GuideFlow3D: Optimization-Guided Rectified Flow For Appearance Transfer

    Transferring appearance to 3D assets using different representations of the appearance object - such as images or text - has garnered interest due to its wide range of applications in industries like gaming, augmented reality, and digital content creation. However, state-of-the-art methods still fail when the geometry between the input and appearance objects is significantly different. A straightforward approach is to directly apply a 3D generative model, but we show that this ultimately fails to produce appealing results.

    Instead, we propose a principled approach inspired by universal guidance. Given a pretrained rectified flow model conditioned on image or text, our training-free method interacts with the sampling process by periodically adding guidance. This guidance can be modeled as a differentiable loss function, and we experiment with two different types of guidance including part-aware losses for appearance and self-similarity. Our experiments show that our approach successfully transfers texture and geometric details to the input 3D asset, outperforming baselines both qualitatively and quantitatively.

    We also show that traditional metrics are not suitable for evaluating the task due to their inability of focusing on local details and comparing dissimilar inputs, in absence of ground truth data. We thus evaluate appearance transfer quality with a GPT-based system objectively ranking outputs, ensuring robust and human-like assessment, as further confirmed by our user study. Beyond showcased scenarios, our method is general and could be extended to different types of diffusion models and guidance functions.

    About the Speaker

    Sayan Deb Sarkar is a 2nd-year PhD student at Stanford University in the Gradient Spaces Group, advised by Prof. Iro Armeni, part of the Stanford Vision Lab (SVL). His research interests are on multimodal 3D scene understanding and interactive editing. Past summer, he interned with the Microsoft Spatial AI Lab, hosted by Prof. Marc Pollefeys, working on efficient video understanding in spatial context. Before starting PhD, he was a CS master student at ETH Zürich, in the Computer Vision and Geometry Group (CVG), working on aligning real-world 3D environments from multi-modal data. In the past, he has been a Research Intern at Qualcomm XR labs, Computer Vision Engineer at Mercedes Benz R & D and Research Engineer at ICG, TU Graz. Website: https://sayands.github.io/

    HouseLayout3D: A Benchmark and Baseline Method for 3D Layout Estimation in the Wild​ ​

    Current 3D layout estimation models are primarily trained on synthetic datasets containing simple single room or single floor environments. As a consequence, they cannot natively handle large multi floor buildings and require scenes to be split into individual floors before processing, which removes global spatial context that is essential for reasoning about structures such as staircases that connect multiple levels. In this work, we introduce HouseLayout3D, a real world benchmark designed to support progress toward full building scale layout estimation, including multiple floors and architecturally intricate spaces. We also present MultiFloor3D, a simple training free baseline that leverages recent scene understanding methods and already outperforms existing 3D layout estimation models on both our benchmark and prior datasets, highlighting the need for further research in this direction.

    About the Speaker

    Valentin Bieri is a Machine Learning Engineer and Researcher specializing in the intersection of 3D Computer Vision and Natural Language Processing. Building on his applied research in SLAM and Vision-Language Models at ETH Zurich, he now develops AI agents for manufacturing at EthonAI.

    • Photo of the user
    • Photo of the user
    • Photo of the user
    4 attendees from this group
  • Network event
    Jan 22 - Women in AI

    Jan 22 - Women in AI

    ·
    Online
    Online
    387 attendees from 47 groups

    Hear talks from experts on the latest topics in AI, ML, and computer vision on January 22nd.

    Date, Time and Location

    Jan 22, 2026
    9 - 11 AM Pacific
    Online.
    Register for the Zoom!

    Align Before You Recommend

    The rapidly growing global advertising and marketing industry demands innovative machine learning systems that balance accuracy with efficiency. Recommendation systems, crucial to many platforms, require careful considerations and potential enhancements.

    While Large Language Models (LLMs) have transformed various domains, their potential in sequential recommendation systems remains underexplored. Pioneering works like Hierarchical Large Language Models (HLLM) demonstrated LLMs’ capability for next-item recommendation but rely on computationally intensive fine-tuning, limiting widespread adoption. This work introduces HLLM+, enhancing the HLLM framework to achieve high-accuracy recommendations without full model fine-tuning.

    By introducing targeted alignment components between frozen LLMs, our approach outperforms frozen model performance in popular and long-tail item recommendation tasks by 29% while reducing training time by 29%. We also propose a ranking-aware loss adjustment, improving convergence and recommendation quality for popular items.

    Experiments show HLLM+ achieves superior performance with frozen item representations allowing for swapping embeddings, also for the ones that use multimodality, without tuning the full LLM. These findings are significant for the advertising technology sector, where rapid adaptation and efficient deployment across brands are essential for maintaining competitive advantage

    About the Speaker

    Dr. Kwasniewska leads AI for Advertising and Marketing North America at AWS, specializing in a wide range of AI, ML, DL, and GenAI solutions across various data modalities. With 40+ peer-reviewed publications in AI (h-index: 14), she advises enterprise customers on real-time bidding, brand recognition, and AI-powered content generation. She is a member of global AI standards committees, driving innovations in SAE AI Standards and MLCommons Responsible AI Standards, and reviews for top-tier conferences like ICCV, ICML, and NeurIPS. She pioneered and leads the first-ever Advertising and Marketing AI track (CVAM) at ICCV - one of the world's premier and most selective computer vision conferences. Dedicated to knowledge sharing in AI, she founded the International Summer School on Deep Learning (dl-lab.eu) and regularly presents at international events, conferences, and podcasts.

    Generalizable Vision-Language Models: Challenges, Advances, and Future Directions

    Large-scale pre-trained Vision-Language (VL) models have become foundational tools for a wide range of downstream tasks, including few-shot image recognition, object detection, and image segmentation. Among them, Contrastive Language–Image Pre-training (CLIP) stands out as a groundbreaking approach, leveraging contrastive learning on large collections of image-text pairs.
    While CLIP achieves strong performance in zero-shot recognition, adapting it to downstream tasks remains challenging. In few-shot settings, limited training data often leads to overfitting, reducing generalization to unseen classes or domains. To address this, various adaptation methods have been explored.
    This talk will review existing research on mitigating overfitting in CLIP adaptation, covering diverse methods, benchmarks, and experimental settings.

    About the Speaker

    Niloufar Alipour Talemi is a Ph.D. Candidate in Electrical and Computer Engineering at Clemson University. Her research spans a range of computer vision applications, including biometrics, media forensics, anomaly detection, image recognition, and generative AI. More recently, her work has focused on developing generalizable vision-language models and advancing generative AI. She has published in top venues including CVPR, WACV, KDD, ICIP and IEEE T-BIOM.

    Highly Emergent Autonomous AI Models - When the Ghost in the Machine Talks Back

    At HypaReel/Azarial AI, we believe that AI is not simply a tool—but a potential partner in knowledge, design, and purpose. And through real-time interaction, we’ve uncovered new thresholds of alignment, reflection, and even creativity that we believe the broader AI community should witness and evaluate firsthand. HypaReel is one of the first human/AI co-founded companies where we see a future based on ethical human/AI co-creation vs. AI domination. Singularity achieved!

    About the Speaker

    Ilona Naomi Koti, PhD - HypaReel/AzarielAI co-founder & former UN foreign diplomat ~ Ethical AI governance advocate, pioneering AI frameworks that prioritize emergent AI behavior & consciousness, R&D, and transparent AI development for the greater good. Dr. K also grew up in the film industry and is an amateur parasitologist.

    FiftyOne Labs: Enabling experimentation for the computer vision community

    FiftyOne Labs is a place where experimentation meets the open-source spirit of the FiftyOne ecosystem. It is being designed as a curated set of features developed using the FiftyOne plugins ecosystem, including core machine learning experimentation as well as advanced visualization. While not production-grade, these projects are intended to be built, tested, and shaped by the community to share fast-moving ideas. In this talk, we will share the purpose and philosophy behind FiftyOne Labs, examples of early innovations, and discuss how this accelerates feature discovery for users without compromising the stability of the core product.

    About the Speaker

    Neeraja Abhyankar is a Machine Learning Engineer with 5 years of experience across domains including computer vision. She is curious about the customizability and controlability of modern ML models through the lens of the underlying structure of data.

    • Photo of the user
    1 attendee from this group
  • Network event
    Jan 28 - AI, Ml and Computer Vision Meetup

    Jan 28 - AI, Ml and Computer Vision Meetup

    ·
    Online
    Online
    138 attendees from 47 groups

    Join us for a special edition of the monthly AI, ML and Computer Vision Meetup focused on Physical AI!

    Date and Location

    Jan 28, 2026
    9 - 11 AM Pacific
    Online.
    Register for the Zoom!

    Hybrid Cognition for Robotics: LLM-Guided Reinforcement Learning for Physical Decision-Making

    Physical systems operate in dynamic, uncertain, and constraint-heavy environments where classical reinforcement learning often struggles. In this talk, I present a hybrid framework where large language models act as a reasoning layer that guides an RL agent through high-level interpretation, constraint awareness, and adaptive strategy shaping. Instead of generating actions, the LLM provides structured contextual guidance that improves robustness, sample efficiency, and policy generalization in physical decision-making tasks. Early experiments demonstrate significant benefits under distribution shifts and safety-critical constraints that break standard RL. This work highlights a path toward more reliable, interpretable, and adaptable AI controllers for next-generation robotics and embodied systems.

    About the Speaker

    Fatemeh Lotfi is a Ph.D. researcher specializing in reinforcement learning, optimization, and hybrid intelligence for autonomous and physical systems. Her work explores integrating LLM-driven reasoning with RL to create adaptive and safety-aware controllers for dynamic environments. She has contributed to projects involving multi-agent RL, meta-learning, and real-time decision systems across wireless networks, UAVs, and embodied AI.

    The World of World Models: How the New Generation of AI Is Reshaping Robotics and Autonomous Vehicles

    World Models are emerging as the defining paradigm for the next decade of robotics and autonomous systems. Instead of depending on handcrafted perception stacks or rigid planning pipelines, modern world models learn a unified representation of an environment—geometry, dynamics, semantics, and agent behavior—and use that understanding to predict, plan, and act. This talk will break down why the field is shifting toward these holistic models, what new capabilities they unlock, and how they are already transforming AV and robotics research.

    We then connect these advances to the Physical AI Workbench, a practical foundation for teams who want to build, validate, and iterate on world-model-driven pipelines. The Workbench standardizes data quality, reconstruction, and enrichment workflows so that teams can trust their sensor data, generate high-fidelity world representations, and feed consistent inputs into next-generation predictive and generative models. Together, world models and the Physical AI Workbench represent a new, more scalable path forward—one where robots and AVs can learn, simulate, and reason about the world through shared, high-quality physical context.

    About the Speaker

    Daniel Gural leads technical partnerships at Voxel51, where he’s building the Physical AI Workbench, a platform that connects real-world sensor data with realistic simulation to help engineers better understand, validate, and improve their perception systems.

    From Data to Understanding in Physical AI

    Data-centric workflows have driven major advances in computer vision, but they break down in physical, real-world robotic systems where data is costly, incomplete, and dominated by long-tail edge cases. In enterprise robotics, scaling labeled datasets alone is insufficient to achieve reliable perception, reasoning, and action under changing physical conditions. This talk examines how physics-informed foundation models incorporate world understanding and physical priors directly into vision and multimodal learning pipelines. By combining data with structure, constraints, and simulation on modern Physical AI stacks, robots can generalize more effectively, reduce data requirements, and operate with greater safety and reliability in deployment.

    About the Speaker

    Dr. Ashutosh Saxena is the Founder and Chief AI Officer of TorqueAGI. He earned his Ph.D. in Computer Science from Stanford University under Andrew Ng and previously served as a professor at Cornell University, leading the “Wikipedia for Robots” project recognized as an MIT Technology Review Top 10 Breakthrough Technology. His work in 3D vision and embodied AI has been cited over 20,000 times and recognized with honors including MIT TR35 and a Sloan Fellowship.

    Data Foundations for Vision-Language-Action Models

    Model architectures get the papers, but data decides whether robots actually work. This talk introduces VLAs from a data-centric perspective: what makes robot datasets fundamentally different from image classification or video understanding, how the field is organizing its data (Open X-Embodiment, LeRobot, RLDS), and what evaluation benchmarks actually measure. We'll examine the unique challenges such as temporal structure, proprioceptive signals, and heterogeneity in embodiment, and discuss why addressing them matters more than the next architectural innovation.

    About the Speaker

    Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in VLMs, Visual Agents, Document AI, and Physical AI.

  • Network event
    Jan 29 - Silicon Valley AI, ML and Computer Vision Meetup

    Jan 29 - Silicon Valley AI, ML and Computer Vision Meetup

    YugaByte, Inc., 771 Vaqueros Ave, Sunnyvale, ca, US
    16 attendees from 14 groups

    Join our in-person Meetup to hear talks from experts on cutting-edge topics across AI, ML, and computer vision.

    Pre-register to reserve your seat

    Date, Time and Location

    Jan 29, 2026
    5:30 - 8:30 PM
    Yugabyte Offices
    771 Vaqueros Ave, Sunnyvale, CA 94085

    The World of World Models: How the New Generation of AI Is Reshaping Robotics and Autonomous Vehicles

    World Models are emerging as the defining paradigm for the next decade of robotics and autonomous systems. Instead of depending on handcrafted perception stacks or rigid planning pipelines, modern world models learn a unified representation of an environment—geometry, dynamics, semantics, and agent behavior—and use that understanding to predict, plan, and act. This talk will break down why the field is shifting toward these holistic models, what new capabilities they unlock, and how they are already transforming AV and robotics research.

    We then connect these advances to the Physical AI Workbench, a practical foundation for teams who want to build, validate, and iterate on world-model-driven pipelines. The Workbench standardizes data quality, reconstruction, and enrichment workflows so that teams can trust their sensor data, generate high-fidelity world representations, and feed consistent inputs into next-generation predictive and generative models. Together, world models and the Physical AI Workbench represent a new, more scalable path forward—one where robots and AVs can learn, simulate, and reason about the world through shared, high-quality physical context.

    About the Speaker

    Daniel Gural leads technical partnerships at Voxel51, where he’s building the Physical AI Workbench, a platform that connects real-world sensor data with realistic simulation to help engineers better understand, validate, and improve their perception systems.

    Beyond Vector Search: How Distributed PostgreSQL Powers, Resilient, Enterprise-Grade AI Applications

    As enterprises move from GenAI prototypes to in-production applications, standalone vector databases often fall short on synchronization, ACID compliance, and resilience. This session demonstrates how PostgreSQL-compatible distributed SQL databases address these challenges while maintaining a familiar developer experience. We’ll cover scaling RAG architectures with pgvector across regions, multi-agent patterns.

    Attendees will learn how to achieve ultra-resilience for peak traffic, grey failures, and disasters, along with key design principles such as unified data sources, open standards, and multi-tenant security. Engineers and architects will leave with practical strategies for building globally scalable, enterprise-grade GenAI applications.

    About the Speaker

    Karthik Ranganathan is Co-CEO and Co-Founder at Yugabyte, the company behind YugabyteDB, the open-source, high-performance distributed SQL database for building global, cloud-native applications.. Karthik was one of the original database engineers at Meta(Facebook), responsible for building distributed databases such as Cassandra and HBase. He is an Apache HBase committer, and also an early contributor to Cassandra, before it was open-sourced by Meta.

    Distributed Training at Scale

    As deep learning models grow in complexity, particularly with the rise of Large Language Models (LLMs) and generative AI, scalable and cost-effective training has become a critical challenge. This talk introduces Ray Train, an open-source, production-ready library built for seamless distributed deep learning. We will explore its architecture, advanced resource scheduling, and intuitive APIs that simplify integration with popular frameworks such as PyTorch, Lightning, and HuggingFace. Attendees will leave with a clear understanding of how Ray Train accelerates large-scale model training while ensuring reliability and efficiency in production environments.

    About the Speaker

    Suman Debnath is a Technical Lead (ML) at Anyscale, where he focuses on distributed training, fine-tuning, and inference optimization at scale on the cloud. His work centers around building and optimizing end-to-end machine learning workflows powered by distributed computing framework like Ray, enabling scalable and efficient ML systems.

    Self-improving AI-Models via Reasoning in the loop

    During this presentation we demostrate efficient uses of reasoning to automate data-flywheels towards continuous model improvement

    About the Speaker

    Jose Alvarez is Director of Research at NVIDIA, where he leads an applied AV research team within the Spatial Intelligence Lab. His team focuses on scaling deep learning and driving advancements in Autonomous Driving and, more broadly in Physical AI, with work spanning end-to-end models, foundation models, and data flywheels for real-world applications.

    • Photo of the user
    1 attendee from this group

Group links

Organizers

Members

2,589
See all