July 11 - Best of CVPR Virtual Event

Name: July 11 - Best of CVPR Virtual Event
Start: 2025-07-11T18:00:00+02:00
End: 2025-07-11T20:00:00+02:00

Evento di networking

76 partecipanti da 38 gruppi

Hosted By Rome AI Machine Learning and Computer Vision Meetup

public group

Dettagli

Join us on July 11 at 9 AM Pacific for the third of several virtual events showcasing some of the most thought-provoking papers from this year’s CVPR conference.

Register for the Zoom

OpenMIBOOD: Open Medical Imaging Benchmarks for Out-Of-Distribution Detection

As AI becomes more prevalent in fields like healthcare, ensuring its reliability under unexpected inputs is essential. We present OpenMIBOOD, a benchmarking framework for evaluating out-of-distribution (OOD) detection methods in medical imaging. It includes 14 datasets across three medical domains and categorizes them into in-distribution, near-OOD, and far-OOD groups to assess 24 post-hoc methods. Results show that OOD detection approaches effective in natural images often fail in medical contexts, highlighting the need for domain-specific benchmarks to ensure trustworthy AI in healthcare.

About the Speaker

Max Gutbrod is a PhD student in Computer Science at OTH Regensburg, Germany, with a research focus on medical imaging. He’s working on improving the resilience of AI systems in healthcare, so they can continue performing reliably, even when faced with unfamiliar or unexpected data.

RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings

The choice of representation for geographic location significantly impacts the accuracy of models for a broad range of geospatial tasks, including fine-grained species classification, population density estimation, and biome classification. Recent works learn such representations by contrastively aligning geolocation[lat,lon] with co-located images.

While these methods work exceptionally well, in this paper, we posit that the current training strategies fail to fully capture the important visual features. We provide an information-theoretic perspective on why the resulting embeddings from these methods discard crucial visual information that is important for many downstream tasks. To solve this problem, we propose a novel retrieval-augmented strategy called RANGE. We build our method on the intuition that the visual features of a location can be estimated by combining the visual features from multiple similar-looking locations. We show this retrieval strategy outperforms the existing state-of-the-art models with significant margins in most tasks.

About the Speaker

Aayush Dhakal is a Ph.D. candidate in Computer Science at Washington University in St. Louis (WashU), currently advised by Dr. Nathan Jacobs in the Multimodal Vision Research Lab (MVRL). My work focuses on solving geospatial problems using Deep Learning and Computer Vision. This often involves some combination of computer vision, remote sensing, and self-supervised learning. I love to develop methods that allow seamless interaction of multiple modalities, such as images, text, audio, and geocoordinates.

FLAIR: Fine-Grained Image Understanding through Language-Guided Representations

CLIP excels at global image-text alignment but struggles with fine-grained visual understanding. In this talk, I present FLAIR—Fine-grained Language-informed Image Representations—which leverages long, detailed captions to learn localized image features. By conditioning attention pooling on diverse sub-captions, FLAIR generates text-specific image embeddings that enhance retrieval of fine-grained content. Our model outperforms existing methods on standard and newly proposed fine-grained retrieval benchmarks, and even enables strong zero-shot semantic segmentation—despite being trained on only 30M image-text pairs.

About the Speaker

Rui Xiao is a PhD student in the Explainable Machine Learning group, supervised by Zeynep Akata from Technical University of Munich and Stephan Alaniz from Telecom Paris. His research focuses on learning across modalities and domains, with a particular emphasis on enhancing fine-grained visual capabilities in vision-language models.

DyCON: Dynamic Uncertainty-aware Consistency and Contrastive Learning for Semi-supervised Medical Image Segmentation

Semi-supervised medical image segmentation often suffers from class imbalance and high uncertainty due to pathology variability. We propose DyCON, a Dynamic Uncertainty-aware Consistency and Contrastive Learning framework that addresses these challenges via two novel losses: UnCL and FeCL. UnCL adaptively weights voxel-wise consistency based on uncertainty, initially focusing on uncertain regions and gradually shifting to confident ones. FeCL improves local feature discrimination under imbalance by applying dual focal mechanisms and adaptive entropy-based weighting to contrastive learning.

About the Speaker

Maregu Assefa is a postdoctoral researcher at Khalifa University in Abu Dhabi, UAE. His current research focuses on advancing semi-supervised and self-supervised multi-modal representation learning for medical image analysis. Previously, his doctoral studies centered on visual representation learning for video understanding tasks, including action recognition and video retrieval.

Artificial Intelligence Computer Vision

Machine Learning Data Science Open Source

Rome AI Machine Learning and Computer Vision Meetup

Visualizza altri eventi

Rome AI Machine Learning and Computer Vision Meetup

public group

Online event

Link visibile ai partecipanti

Rome AI Machine Learning and Computer Vision Meetup

public group

July 11 - Best of CVPR Virtual Event

GRATIS