Aller au contenu

Sept 12 - Visual AI in Manufacturing and Robotics (Day 3)

Événement de réseau
163 participants de 44 groupes hébergeant
Photo de Jimmy Guerrero
Hosted By
Jimmy G.
Sept 12 - Visual AI in Manufacturing and Robotics (Day 3)

Détails

Join us for day three in a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI, Manufacturing and Robotics.

Date and Time

Sept 12 at 9 AM Pacific

Location

Virtual. Register for the Zoom!

Towards Robotics Foundation Models that Can Reason

In recent years, we have witnessed remarkable progress in generative AI, particularly in language and visual understanding and generation. This leap has been fueled by unprecedentedly large image–text datasets and the scaling of large language and vision models trained on them. Increasingly, these advances are being leveraged to equip and empower robots with open-world visual understanding and reasoning capabilities.

Yet, despite these advances, scaling such models for robotics remains challenging due to the scarcity of large-scale, high-quality robot interaction data, limiting their ability to generalize and truly reason about actions in the real world. Nonetheless, promising results are emerging from using multimodal large language models (MLLMs) as the backbone of robotic systems, especially in enabling the acquisition of low-level skills required for robust deployment in everyday household settings.

In this talk, I will present three recent works that aim to bridge the gap between rich semantic world knowledge in MLLMs and actionable robot control. I will begin with AHA, a vision-language model that reasons about failures in robotic manipulation and improves the robustness of existing systems. Building on this, I will introduce SAM2Act, a 3D generalist robotic model with a memory-centric architecture capable of performing high-precision manipulation tasks while retaining and reasoning over past observations. Finally, I will present MolmoAct, AI2’s flagship robotic foundation model for action reasoning, designed as a generalist system that can be post-trained for a wide range of downstream manipulation tasks.

About the Speaker

Jiafei Duan is a Ph.D. candidate in Computer Science & Engineering at the University of Washington, advised by Professors Dieter Fox and Ranjay Krishna. His research focuses on foundation models for robotics, with an emphasis on developing scalable data collection and generation methods, grounding vision-language models in robotic reasoning, and advancing robust generalization in robot learning. His work has been featured in MIT Technology Review, GreekWire, VentureBeat, and Business Wire.

Beyond Academic Benchmarks: Critical Analysis and Best Practices for Visual Industrial Anomaly Detection

In this talk, I will share our recent research efforts in visual industrial anomaly detection. It will present a comprehensive empirical analysis with a focus on real-world applications, demonstrating that recent SOTA methods perform worse than methods from 2021 when evaluated on a variety of datasets. We will also investigate how different practical aspects, such as input size, distribution shift, data contamination, and having a validation set, affect the results.

About the Speaker

Aimira Baitieva is a Research Engineer at Valeo, where she works primarily on computer vision problems. Her recent work has been focused on deep learning anomaly detection for automating visual inspection, incorporating both research and practical applications in the manufacturing sector.

The Digital Reasoning Thread in Manufacturing: Orchestrating Vision, Simulation, and Robotics

Manufacturing is entering a new phase where AI is no longer confined to isolated tasks like defect detection or predictive maintenance. Advances in reasoning AI, simulation, and robotics are converging to create end-to-end systems that can perceive, decide, and act – in both digital and physical environments.

This talk introduces the Digital Reasoning Thread – a consistent layer of AI reasoning that runs through every stage of manufacturing, connecting visual intelligence, digital twins, simulation environments, and robotic execution. By linking perception with advanced reasoning and action, this approach enables faster, higher-quality decisions across the entire value chain.

We will explore real-world examples of applying reasoning AI in industrial settings, combining simulation-driven analysis, orchestration frameworks, and the foundations needed for robotic execution in the physical world. Along the way, we will examine the key technical building blocks – from data pipelines and interoperability standards to agentic AI architectures – that make this level of integration possible.

Attendees will gain a clear understanding of how to bridge AI-driven perception with simulation and robotics, and what it takes to move from isolated pilots to orchestrated, autonomous manufacturing systems.

About the Speaker

Vlad Larichev is an Industrial AI Lead at Accenture Industry X, specializing in applying AI, generative AI, and agentic AI to engineering, manufacturing, and large-scale industrial operations. With a background as an engineer, solution architect, and software developer, he has led AI initiatives across sectors including automotive, energy, and consumer goods, integrating advanced analytics, computer vision, and simulation into complex industrial environments.

Vlad is the creator of the Digital Reasoning Thread – a framework for connecting AI reasoning across visual intelligence, simulation, and physical execution. He is an active public speaker, podcast host, and community builder, sharing practical insights on scaling AI from pilot projects to enterprise-wide adoption.

The Road to Useful Robots

This talk explores the current state of AI-enabled robots and the issues with deploying more advanced models on constrained hardware, including limited compute and power budgets. It then moves on to what's next for developing useful, intelligent robots.

About the Speaker

Michael Hart, also known as Mike Likes Robots. is a robotics software engineer and content creator. His mission is to share knowledge to accelerate robotics. @mikelikesrobots

Photo of Paris AI, Machine Learning and Computer Vision Meetup group
Paris AI, Machine Learning and Computer Vision Meetup
Afficher d'autres événements
GRATUIT