This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more, visit the FiftyOne project page on GitHub.

Upcoming events (4+)

See all

Network event
272 attendees from 44 groups hosting
Fri, 12 Sept 2025, 7:00 pm UTCSept 12 - Visual AI in Manufacturing and Robotics (Day 3)
Link visible for attendees
Join us for day three in a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI, Manufacturing and Robotics.

Date and Time

Sept 12 at 9 AM Pacific

Location

Virtual. Register for the Zoom!

Towards Robotics Foundation Models that Can Reason

In recent years, we have witnessed remarkable progress in generative AI, particularly in language and visual understanding and generation. This leap has been fueled by unprecedentedly large image–text datasets and the scaling of large language and vision models trained on them. Increasingly, these advances are being leveraged to equip and empower robots with open-world visual understanding and reasoning capabilities.

Yet, despite these advances, scaling such models for robotics remains challenging due to the scarcity of large-scale, high-quality robot interaction data, limiting their ability to generalize and truly reason about actions in the real world. Nonetheless, promising results are emerging from using multimodal large language models (MLLMs) as the backbone of robotic systems, especially in enabling the acquisition of low-level skills required for robust deployment in everyday household settings.

In this talk, I will present three recent works that aim to bridge the gap between rich semantic world knowledge in MLLMs and actionable robot control. I will begin with AHA, a vision-language model that reasons about failures in robotic manipulation and improves the robustness of existing systems. Building on this, I will introduce SAM2Act, a 3D generalist robotic model with a memory-centric architecture capable of performing high-precision manipulation tasks while retaining and reasoning over past observations. Finally, I will present MolmoAct, AI2’s flagship robotic foundation model for action reasoning, designed as a generalist system that can be post-trained for a wide range of downstream manipulation tasks.

About the Speaker

Jiafei Duan is a Ph.D. candidate in Computer Science & Engineering at the University of Washington, advised by Professors Dieter Fox and Ranjay Krishna. His research focuses on foundation models for robotics, with an emphasis on developing scalable data collection and generation methods, grounding vision-language models in robotic reasoning, and advancing robust generalization in robot learning. His work has been featured in MIT Technology Review, GreekWire, VentureBeat, and Business Wire.

Beyond Academic Benchmarks: Critical Analysis and Best Practices for Visual Industrial Anomaly Detection

In this talk, I will share our recent research efforts in visual industrial anomaly detection. It will present a comprehensive empirical analysis with a focus on real-world applications, demonstrating that recent SOTA methods perform worse than methods from 2021 when evaluated on a variety of datasets. We will also investigate how different practical aspects, such as input size, distribution shift, data contamination, and having a validation set, affect the results.

About the Speaker

Aimira Baitieva is a Research Engineer at Valeo, where she works primarily on computer vision problems. Her recent work has been focused on deep learning anomaly detection for automating visual inspection, incorporating both research and practical applications in the manufacturing sector.

The Digital Reasoning Thread in Manufacturing: Orchestrating Vision, Simulation, and Robotics

Manufacturing is entering a new phase where AI is no longer confined to isolated tasks like defect detection or predictive maintenance. Advances in reasoning AI, simulation, and robotics are converging to create end-to-end systems that can perceive, decide, and act – in both digital and physical environments.

This talk introduces the Digital Reasoning Thread – a consistent layer of AI reasoning that runs through every stage of manufacturing, connecting visual intelligence, digital twins, simulation environments, and robotic execution. By linking perception with advanced reasoning and action, this approach enables faster, higher-quality decisions across the entire value chain.

We will explore real-world examples of applying reasoning AI in industrial settings, combining simulation-driven analysis, orchestration frameworks, and the foundations needed for robotic execution in the physical world. Along the way, we will examine the key technical building blocks – from data pipelines and interoperability standards to agentic AI architectures – that make this level of integration possible.

Attendees will gain a clear understanding of how to bridge AI-driven perception with simulation and robotics, and what it takes to move from isolated pilots to orchestrated, autonomous manufacturing systems.

About the Speaker

Vlad Larichev is an Industrial AI Lead at Accenture Industry X, specializing in applying AI, generative AI, and agentic AI to engineering, manufacturing, and large-scale industrial operations. With a background as an engineer, solution architect, and software developer, he has led AI initiatives across sectors including automotive, energy, and consumer goods, integrating advanced analytics, computer vision, and simulation into complex industrial environments.

Vlad is the creator of the Digital Reasoning Thread – a framework for connecting AI reasoning across visual intelligence, simulation, and physical execution. He is an active public speaker, podcast host, and community builder, sharing practical insights on scaling AI from pilot projects to enterprise-wide adoption.

The Road to Useful Robots

This talk explores the current state of AI-enabled robots and the issues with deploying more advanced models on constrained hardware, including limited compute and power budgets. It then moves on to what's next for developing useful, intelligent robots.

About the Speaker

Michael Hart, also known as Mike Likes Robots. is a robotics software engineer and content creator. His mission is to share knowledge to accelerate robotics. @mikelikesrobots
42 attendees from this group+37
Network event
109 attendees from 44 groups hosting
Tue, 30 Sept 2025, 4:00 pm UTCSept 30 - Getting Started with FiftyOne for Manufacturing Use Cases
Link visible for attendees
Day and Time

Sept 30 at 9 AM Pacific

Location

Virtual. Register for the Zoom.

Are you working with computer vision in manufacturing and need deeper visibility into your datasets and models? Join us for a free 90-minute hands-on workshop and learn how to leverage the open-source FiftyOne toolset to optimize your visual AI workflows, from anomaly detection on the production line to worker safety and quality assurance in additive manufacturing.

In this session, you'll learn how to:

Visualize and audit complex manufacturing datasets.

Explore visual embeddings for failure mode analysis.

Identify and fix labeling issues affecting production models.

Perform advanced data curation for specialized use cases.

Integrate with annotation tools, model pipelines, and plugins.

We'll take a data-centric approach to computer vision, starting with importing and exploring industrial visual data, including defects, wear patterns, and worker posture. You'll learn to query and filter datasets to surface edge cases, then use plugins and native integrations to streamline workflows.

We'll walk through generating candidate ground truth labels and evaluating fine-tuned foundational models — particularly relevant to manufacturers using pre-trained models for tasks like defect segmentation or object localization in dynamic environments.
By the end, you'll see how the FiftyOne App and SDK work together to enable more profound insight into visual AI systems. We'll conclude with a demo showcasing 3D view reconstruction for industrial inspection, revealing how Visual AI can bridge physical and digital layers of your production process.

Prerequisites: Basic knowledge of Python and computer vision fundamentals.

Resources Provided: All attendees will receive access to tutorials, videos, and the workshop codebase.

About the Instructor

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia.
21 attendees from this group+16
Network event
197 attendees from 43 groups hosting
Thu, 2 Oct 2025, 4:00 pm UTCOct 2 - Women in AI Virtual Event
Link visible for attendees
Hear talks from experts on the latest topics in AI, ML, and computer vision.

Date and Time

Oct 2 at 9 AM Pacific

Location

Virtual. Register for the Zoom.

The Hidden Order of Intelligent Systems: Complexity, Autonomy, and the Future of AI

As artificial intelligence systems grow more autonomous and integrated into our world, they also become harder to predict, control, and fully understand. This talk explores how complexity theory can help us make sense of these challenges, by revealing the hidden patterns that drive collective behavior, adaptation, and resilience in intelligent systems. From emergent coordination among autonomous agents to nonlinear feedback in real-world deployments, we’ll explore how order arises from chaos, and what that means for the next generation of AI. Along the way, we’ll draw connections to neuroscience, agentic AI, and distributed systems that offer fresh insights into designing multi-faceted AI systems.

About the Speaker

Ria Cheruvu is a Senior Trustworthy AI Architect at NVIDIA. She holds a master’s degree in data science from Harvard and teaches data science and ethical AI across global platforms. Ria is passionate about uncovering the hidden dynamics that shape intelligent systems—from natural networks to artificial ones.

Managing Medical Imaging Datasets: From Curation to Evaluation

High-quality data is the cornerstone of effective machine learning in healthcare. This talk presents practical strategies and emerging techniques for managing medical imaging datasets, from synthetic data generation and curation to evaluation and deployment.

We’ll begin by highlighting real-world case studies from leading researchers and practitioners who are reshaping medical imaging workflows through data-centric practices. The session will then transition into a hands-on tutorial using FiftyOne, the open-source platform for visual dataset inspection and model evaluation. Attendees will learn how to load, visualize, curate, and evaluate medical datasets across various imaging modalities.

Whether you're a researcher, clinician, or ML engineer, this talk will equip you with practical tools and insights to improve dataset quality, model reliability, and clinical impact.

About the Speaker

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia.

Building Agents That Learn: Managing Memory in AI Agents

In the rapidly evolving landscape of agentic systems, memory management has emerged as a key pillar for building intelligent, context-aware AI Agents. Different types of memory, such as short-term and long-term memory, play distinct roles in supporting an agent's functionality. In this talk, we will explore these types of memory, discuss challenges with managing agentic memory, and present practical solutions for building agentic systems that can learn from their past executions and personalize their interactions over time.

About the Speaker

Apoorva Joshi is a Data Scientist turned Developer Advocate, with over 7 years of experience applying machine learning to problems in domains such as cybersecurity and mental health. As an AI Developer Advocate at MongoDB, she now helps developers be successful at building AI applications through written content and hands-on workshops.

Human-Centered AI: Soft Skills That Make Visual AI Work in Manufacturing

Visual AI systems can spot defects and optimize workflows—but it’s people who train, deploy, and trust the results. This session explores the often-overlooked soft skills that make Visual AI implementations successful: communication, cross-functional collaboration, documentation habits, and on-the-floor leadership. Sheena Yap Chan shares practical strategies to reduce resistance to AI tools, improve adoption rates, and build inclusive teams where operators, engineers, and executives align. Attendees will leave with actionable techniques to drive smoother, people-first AI rollouts in manufacturing environments.

About the Speaker

Sheena Yap Chan is a Wall Street Journal Bestselling Author, leadership speaker and consultant who helps organizations develop confidence, communication, and collaboration skills that drive innovation and team performance—especially in high-tech, high-change industries. She’s worked with leaders across engineering, operations, and manufacturing to align people with digital transformation goals.
49 attendees from this group+44
Network event
139 attendees from 44 groups hosting
Wed, 15 Oct 2025, 4:00 pm UTCOct 15 - Visual AI in Agriculture (Day 1)
Link visible for attendees
Join us for day one of a series of virtual events to hear talks from experts on the latest developments at the intersection of Visual AI in Agriculture.

Date and Time
Oct 15 at 9 AM Pacific

Location
Virtual. Register for the Zoom.

Paved2Paradise: Scalable LiDAR Simulation for Real-World Perception

Training robust perception models for robotics and autonomy often requires massive, diverse 3D datasets. But collecting and annotating real-world LiDAR point clouds at scale is both expensive and time-consuming, especially when high-quality labels are needed. Paved2Paradise introduces a cost-effective alternative: a scalable LiDAR simulation pipeline that generates realistic, fully annotated datasets with minimal human labeling effort.

The key idea is to “factor the real world” by separately capturing background scans (e.g., fields, roads, construction sites) and object scans (e.g., vehicles, people, machinery). By intelligently combining these two sources, Paved2Paradise can synthesize a combinatorially large set of diverse training scenes. The pipeline involves four steps: (1) collecting extensive background LiDAR scans, (2) recording high-resolution scans of target objects under controlled conditions, (3) inserting objects into backgrounds with physically consistent placement and occlusion, and (4) simulating LiDAR geometry to ensure realism.

Experiments show that models trained on Paved2Paradise-generated data transfer effectively to the real world, achieving strong detection performance with far less manual annotation compared to conventional dataset collection. The approach is not only cost-efficient, but also flexible—allowing practitioners to easily expand to new object classes or domains by swapping in new background or object scans.
For ML practitioners working in robotics, autonomous vehicles, or safety-critical perception, Paved2Paradise highlights a practical path toward scaling training data without scaling costs. It bridges the gap between simulation and real-world performance, enabling faster iteration and more reliable deployment of perception models.

About the Speaker

Michael A. Alcorn is a Senior Machine Learning Engineer at John Deere, where he develops deep learning models for LiDAR and RGB perception in safety-critical, real-time systems. He earned his Ph.D. in Computer Science from Auburn University, with a dissertation on improving computer vision and spatiotemporal deep neural networks, and also holds a Graduate Minor in Mathematics. Michael’s research has been cited by researchers at DeepMind, Google, Meta, Microsoft, and OpenAI, among others, and his (batter|pitcher)2vec paper was a prize-winner at the 2018 MIT Sloan Sports Analytics Conference. He has also contributed machine learning code to scikit-learn and Apache Solr, and his GitHub repositories—which have collectively received over 2,100 stars—have served as starting points for research and production code at many different organizations.

MothBox: inexpensive, open-source, automated insect monitor

Dr. Andy Quitmeyer will talk about the design of an exciting new open source science tool, The Mothbox. The Mothbox is an award winning project for broad scale monitoring of insects for biodiversity. It's a low cost device developed in harsh Panamanian jungles which takes super high resolution photos to then automatically ID the levels of biodiversity in forests and agriculture. After thousands of insect observations and hundreds of deployments in Panama, Peru, Mexico, Ecuador, and the US, we are now developing a new, manufacturable version to share this important tool worldwide. We will discuss the development of this device in the jungles of Panama and its importance to studying biodiversity worldwide.

About the Speaker

Dr. Andy Quitmeyer designs new ways to interact with the natural world. He has worked with large organizations like Cartoon Network, IDEO, and the Smithsonian, taught as a tenure-track professor at the National University of Singapore, and even had his research turned into a (silly) television series called “Hacking the Wild,” distributed by Discovery Networks.

Now, he spends most of his time volunteering with smaller organizations, and recently founded the field-station makerspace, Digital Naturalism Laboratories. In the rainforest of Gamboa, Panama, Dinalab blends biological fieldwork and technological crafting with a community of local and international scientists, artists, engineers, and animal rehabilitators. He currently also advises students as an affiliate professor at the University of Washington.

Foundation Models for Visual AI in Agriculture

Foundation models have enabled a new way to address tasks, by benefitting from emerging capabilities in a zero-shot manner. In this talk I will discuss recent research on enabling visual AI in a zero-shot manner and via fine-tuning. Specifically, I will discuss joint work on RELOCATE, a simple training-free baseline designed to perform the challenging task of visual query localization in long videos.

To eliminate the need for task-specific training and efficiently handle long videos, RELOCATE leverages a region-based representation derived from pretrained vision models. I will also discuss joint work on enabling multi-modal large language models (MLLMs) to correctly answer prompts that require a holistic spatio-temporal understanding: MLLMs struggle to answer prompts that refer to 1) the entirety of an environment that an agent equipped with an MLLM can operate in; and simultaneously also refer to 2) recent actions that just happened and are encoded in a video clip.

However, such a holistic spatio-temporal understanding is important for agents operating in the real world. Our solution involves development of a dedicated data collection pipeline and fine-tuning of an MLLM equipped with projectors to improve both spatial understanding of an environment and temporal understanding of recent observations.

About the Speaker

Alex Schwing is an Associate Professor at the University of Illinois at Urbana-Champaign working with talented students on artificial intelligence, generative AI, and computer vision topics. He received his B.S. and diploma in Electrical Engineering and Information Technology from the Technical University of Munich in 2006 and 2008 respectively, and obtained a PhD in Computer Science from ETH Zurich in 2014. Afterwards he joined University of Toronto as a postdoctoral fellow until 2016.

His research interests are in the area of artificial intelligence, generative AI, and computer vision, where he has co-authored numerous papers on topics in scene understanding, inference and learning algorithms, deep learning, image and language processing, and generative modeling. His PhD thesis was awarded an ETH medal and his team’s research was awarded an NSF CAREER award.

Beyond the Lab: Real-World Anomaly Detection for Agricultural Computer Vision

Anomaly detection is transforming manufacturing and surveillance, but what about agriculture? Can AI actually detect plant diseases and pest damage early enough to make a difference? This talk demonstrates how anomaly detection identifies and localizes crop problems using coffee leaf health as our primary example. We'll start with the foundational theory, then examine how these models detect rust and miner damage in leaf imagery.

The session includes a comprehensive hands-on workflow using the open-source FiftyOne computer vision toolkit, covering dataset curation, patch extraction, model training, and result visualization. You'll gain both theoretical understanding of anomaly detection in computer vision and practical experience applying these techniques to agricultural challenges and other domains.

About the Speaker

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia.
12 attendees from this group+7