Skip to content

Details

The Best of ICRA is a three-day virtual meetup series featuring researchers presenting their accepted papers from the 2026 International Conference on Robotics and Automation (ICRA).

Date, Time and Location

Jul 22, 2026
9:00 AM - 11:00 AM PST
Online. Register for the Zoom!

Contrastive learning on 3d point clouds for geometric defect detection

Reliable 3D defect detection in manufacturing is hard: the input is a point cloud — an unordered set that standard neural backbones cannot process directly; high-quality training data is scarce; and real scans are noisy and arrive in arbitrary orientations. We address these challenges in COSARAD, a contrastive learning framework that learns highly discriminative representations of object surface geometry under weak supervision.

When a test object arrives, we extract its features and compare them against a library of defect-free reference shapes for precise, interpretable defect localization — achieving state-of-the-art accuracy on industrial benchmarks such as Real3D-AD. In my talk, I'll cover the design choices behind the system, why contrastive representation learning is the right fit for sparse 3D data, and open problems in scaling inspection to production.

About the Speaker

Alexander Tarvo is a researcher at the University of Washington's MACS Lab, where he works on computer vision with applications in robotics. He holds a PhD in Software Engineering from Brown University and previously held research and engineering roles at Google, Microsoft, and IBM Research. His current research focuses on 3D vision and reinforcement learning for industrial robotics.

A Semantic and Occlusion-Aware Gaussian Mixture Probability Hypothesis Density Filter

Reliable and resilient multi-target tracking is foundational for safe autonomous driving, yet most perception pipelines frequently struggle with sensor noise, heavy clutter, and severe environmental occlusions. To resolve these limitations, this talk presents a novel Semantic-Occlusion Aware (S-OA) Gaussian Mixture Probability Hypothesis Density (GM-PHD) filter.

By combining geometric occlusion reasoning with deep learning-derived environmental semantics, the proposed framework adaptively initializes target tracking in regions where new targets are likely to appear. Evaluations demonstrate that this context-aware tracking system minimizes track initiation latency and preserves high tracking precision even under intense clutter.

Ultimately, this work demonstrates how embedding spatial and semantic structure into filtering yields a significantly more robust and resilient perception stack for autonomous navigation.

About the Speaker

Jovan Menezes is a PhD student at Cornell University, advised by Prof. Mark Campbell. His research focuses on developing scalable and resilient perception algorithms for autonomous driving. By leveraging concepts from probabilistic estimation and deep learning-based computer vision, the goal is to enable autonomous vehicles to perceive and navigate in challenging environments.

An Annotation-to-Detection Framework for Autonomous and Robust Vine Trunk Localization in the Field by Mobile Agricultural Robots

Autonomous robots struggle to detect objects in unstructured fields, requiring in-domain tuning with laborious manual data collection. In this work, we introduce a comprehensive annotation-to-detection framework designed to train a robust multi-modal detector using limited and partially labeled training data.

Our method combines cross-modal annotation transfer, early sensor fusion, and a multi-stage detection architecture to train and enhance multi-modal detection. Validated on vineyard trunk detection and paired with a custom LOAM algorithm, it localised over 70% of trees in one pass with under 0.37 m mean error.

Our system demonstrated that robust detection is achievable even with minimal initial annotations and human intervention.

About the Speaker

Dimitrios Chatziparaschis is a PhD candidate in EE, in University of California, Riverside. His main research lies at the intersection of computer vision, machine learning, and robotics. Main topics include 3D perception, multi-modal sensing, landmark detection, and localization in outdoor and dynamic settings.

vS-Graphs: Tightly Coupling Visual SLAM and 3D Scene Graphs Exploiting Hierarchical Scene Understanding

We introduce vS-Graphs, a novel real-time VSLAM framework that integrates vision-based scene understanding with map reconstruction and comprehensible graph-based representation. The framework infers structural elements (i.e., rooms and floors) from detected building components (i.e., walls and ground surfaces) and incorporates them into optimizable 3D scene graphs.

This solution enhances the reconstructed map's semantic richness, comprehensibility, and localization accuracy.

About the Speaker

Ali Tourani an R&D Specialist and a Senior Software Engineer with 8+ years of experience in practical computer vision and AI system design and deployment. Currently, he holds a Postdoctoral Research Associate position at the University of Luxembourg, where he develops vision-language models and generative AI solutions for real-world robotic applications.

Related topics

Artificial Intelligence
Computer Vision
Machine Learning
Data Science
Open Source

You may also like