This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more, visit the FiftyOne project page on GitHub.

Upcoming events (4)

See all

Network event
141 attendees from 38 groups hosting
Fri, 11 July 2025, 4:00 pm UTCJuly 11 - Best of CVPR Virtual Event
Link visible for attendees
Join us on July 11 at 9 AM Pacific for the third of several virtual events showcasing some of the most thought-provoking papers from this year’s CVPR conference.

Register for the Zoom

OpenMIBOOD: Open Medical Imaging Benchmarks for Out-Of-Distribution Detection

As AI becomes more prevalent in fields like healthcare, ensuring its reliability under unexpected inputs is essential. We present OpenMIBOOD, a benchmarking framework for evaluating out-of-distribution (OOD) detection methods in medical imaging. It includes 14 datasets across three medical domains and categorizes them into in-distribution, near-OOD, and far-OOD groups to assess 24 post-hoc methods. Results show that OOD detection approaches effective in natural images often fail in medical contexts, highlighting the need for domain-specific benchmarks to ensure trustworthy AI in healthcare.

About the Speaker

Max Gutbrod is a PhD student in Computer Science at OTH Regensburg, Germany, with a research focus on medical imaging. He’s working on improving the resilience of AI systems in healthcare, so they can continue performing reliably, even when faced with unfamiliar or unexpected data.

RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings

The choice of representation for geographic location significantly impacts the accuracy of models for a broad range of geospatial tasks, including fine-grained species classification, population density estimation, and biome classification. Recent works learn such representations by contrastively aligning geolocation[lat,lon] with co-located images.

While these methods work exceptionally well, in this paper, we posit that the current training strategies fail to fully capture the important visual features. We provide an information-theoretic perspective on why the resulting embeddings from these methods discard crucial visual information that is important for many downstream tasks. To solve this problem, we propose a novel retrieval-augmented strategy called RANGE. We build our method on the intuition that the visual features of a location can be estimated by combining the visual features from multiple similar-looking locations. We show this retrieval strategy outperforms the existing state-of-the-art models with significant margins in most tasks.

About the Speaker

Aayush Dhakal is a Ph.D. candidate in Computer Science at Washington University in St. Louis (WashU), currently advised by Dr. Nathan Jacobs in the Multimodal Vision Research Lab (MVRL). My work focuses on solving geospatial problems using Deep Learning and Computer Vision. This often involves some combination of computer vision, remote sensing, and self-supervised learning. I love to develop methods that allow seamless interaction of multiple modalities, such as images, text, audio, and geocoordinates.

FLAIR: Fine-Grained Image Understanding through Language-Guided Representations

CLIP excels at global image-text alignment but struggles with fine-grained visual understanding. In this talk, I present FLAIR—Fine-grained Language-informed Image Representations—which leverages long, detailed captions to learn localized image features. By conditioning attention pooling on diverse sub-captions, FLAIR generates text-specific image embeddings that enhance retrieval of fine-grained content. Our model outperforms existing methods on standard and newly proposed fine-grained retrieval benchmarks, and even enables strong zero-shot semantic segmentation—despite being trained on only 30M image-text pairs.

About the Speaker

Rui Xiao is a PhD student in the Explainable Machine Learning group, supervised by Zeynep Akata from Technical University of Munich and Stephan Alaniz from Telecom Paris. His research focuses on learning across modalities and domains, with a particular emphasis on enhancing fine-grained visual capabilities in vision-language models.

DyCON: Dynamic Uncertainty-aware Consistency and Contrastive Learning for Semi-supervised Medical Image Segmentation

Semi-supervised medical image segmentation often suffers from class imbalance and high uncertainty due to pathology variability. We propose DyCON, a Dynamic Uncertainty-aware Consistency and Contrastive Learning framework that addresses these challenges via two novel losses: UnCL and FeCL. UnCL adaptively weights voxel-wise consistency based on uncertainty, initially focusing on uncertain regions and gradually shifting to confident ones. FeCL improves local feature discrimination under imbalance by applying dual focal mechanisms and adaptive entropy-based weighting to contrastive learning.

About the Speaker

Maregu Assefa is a postdoctoral researcher at Khalifa University in Abu Dhabi, UAE. His current research focuses on advancing semi-supervised and self-supervised multi-modal representation learning for medical image analysis. Previously, his doctoral studies centered on visual representation learning for video understanding tasks, including action recognition and video retrieval.
2 attendees from this group
Network event
414 attendees from 39 groups hosting
Thu, 17 July 2025, 5:00 pm UTCJuly 17 - AI, ML and Computer Vision Meetup
Link visible for attendees
When and Where

July 17, 2025 | 10:00 – 11:30 AM Pacific

Virtually over Zoom. Sign up!

Using VLMs to Navigate the Sea of Data

At SEA.AI, we aim to make ocean navigation safer by enhancing situational awareness with AI. To develop our technology, we process huge amounts of maritime video from onboard cameras. In this talk, we’ll show how we use Vision-Language Models (VLMs) to streamline our data workflows; from semantic search using embeddings to automatically surfacing rare or high-interest events like whale spouts or drifting containers. The goal: smarter data curation with minimal manual effort.

About the Speaker

Daniel Fortunato, an AI Researcher at SEA.AI, is dedicated to enhancing efficiency through data workflow optimizations. Daniel’s background includes a Master’s degree in Electrical Engineering, providing a robust framework for developing innovative AI solutions. Beyond the lab, he is an enthusiastic amateur padel player and surfer.

SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation

Referring Video Object Segmentation (RVOS) involves segmenting objects in video based on natural language descriptions. SAMWISE builds on Segment Anything 2 (SAM2) to support RVOS in streaming settings, without fine-tuning and without relying on external large Vision-Language Models. We introduce a novel adapter that injects temporal cues and multi-modal reasoning directly into the feature extraction process, enabling both language understanding and motion modeling. We also unveil a phenomenon we denote tracking bias, where SAM2 may persistently follow an object that only loosely matches the query, and propose a learnable module to mitigate it. SAMWISE achieves state-of-the-art performance across multiple benchmarks with less than 5M additional parameters.

About the Speaker

Claudia Cuttano is a PhD student at Politecnico di Torino (VANDAL Lab), currently on a research visit at TU Darmstadt, where she works with Prof. Stefan Roth in the Visual Inference Lab. Her research focuses on semantic segmentation, with particular emphasis on multi-modal understanding and the use of foundation models for pixel-level tasks.

Building Efficient and Reliable Workflows for Object Detection

Training complex AI models at scale requires orchestrating multiple steps into a reproducible workflow and understanding how to optimize resource utilization for efficient pipelines. Modern MLOps practices help streamline these processes, improving the efficiency and reliability of your AI pipelines.

About the Speaker

Sage Elliott is an AI Engineer with a background in computer vision, LLM evaluation, MLOps, IoT, and Robotics. He’s taught thousands of people at live workshops. You can usually find him in Seattle biking around to parks or reading in cafes, catching up on the latest read for AI Book Club.

Your Data Is Lying to You: How Semantic Search Helps You Find the Truth in Visual Datasets

High-performing models start with high-quality data—but finding noisy, mislabeled, or edge-case samples across massive datasets remains a significant bottleneck. In this session, we’ll explore a scalable approach to curating and refining large-scale visual datasets using semantic search powered by transformer-based embeddings. By leveraging similarity search and multimodal representation learning, you’ll learn to surface hidden patterns, detect inconsistencies, and uncover edge cases. We’ll also discuss how these techniques can be integrated into data lakes and large-scale pipelines to streamline model debugging, dataset optimization, and the development of more robust foundation models in computer vision. Join us to discover how semantic search reshapes how we build and refine AI systems.

About the Speaker

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. During her PhD and Postdoc research, she deployed multiple low-cost, smart edge & IoT computing technologies, such as farmers, that can be operated without expertise in computer vision systems. The central objective of Paula’s research has been to develop intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs, such as those in the agricultural industry.
11 attendees from this group+6
Network event
71 attendees from 39 groups hosting
Wed, 23 July 2025, 4:00 pm UTCJuly 23 - Getting Started with FiftyOne for Healthcare Use Cases
Link visible for attendees
When

Jul 23, 2025 at 9:00 - 10:30 AM Pacific

Where

Online. Register for the Zoom!

About the Workshop

Visual AI is revolutionizing healthcare by enabling more accurate diagnoses, streamlining medical workflows, and uncovering valuable insights across various imaging modalities. Yet, building trustworthy AI in healthcare demands more than powerful models — it requires clean, curated data, strong visualizations, and human-in-the-loop understanding.

Join us for a free, 90-minute, hands-on workshop built for healthcare researchers, medical data scientists, and AI engineers working with real-world imaging data. Whether you're analyzing CT scans, radiology images, or multi-modal patient datasets, this session will equip you with the tools to design robust, transparent, and insight-driven computer vision pipelines — powered by FiftyOne, the open-source platform for Visual AI.

By the end of the workshop, you'll be able to:

Load and organize complex medical datasets (e.g., ARCADE, DeepLesion) with FiftyOne.

Explore medical imaging data using embeddings, patches, and metadata filters.

Curate balanced datasets and fine-tune models using Ultralytics YOLOv8 for tasks like stenosis detection.

Analyze segment CT scans using MedSAM2.

Analyze results from VLMs and foundation models like MedGEMMA, NVIDIA VISTA, and NVIDIA CRADIO.

Evaluate model predictions and uncover failure cases using real-world clinical examples.

Why Attend?

This healthcare edition of our "Getting Started with FiftyOne" workshop connects foundational tools with real-world impact. Through curated datasets and clinical use cases, you'll see how to harness Visual AI responsibly, building data-centric pipelines that promote accuracy, interpretability, and trust in medical AI systems.

Prerequisites

Basic knowledge of Python and computer vision is recommended. No prior experience in healthcare is required — just curiosity and a commitment to building meaningful AI.

All participants will receive access to workshop notebooks, code examples, and extended resources to continue their journey in healthcare AI.
Network event
103 attendees from 39 groups hosting
Thu, 24 July 2025, 4:00 pm UTCJuly 24 - Women in AI
Link visible for attendees
Hear talks from experts on cutting-edge topics in AI, ML, and computer vision!

When

Jul 24, 2025 at 9 - 11 AM Pacific

Where

Online. Register for the Zoom

Exploring Vision-Language-Action (VLA) Models: From LLMs to Embodied AI

This talk will explore the evolution of foundation models, highlighting the shift from large language models (LLMs) to vision-language models (VLMs), and now to vision-language-action (VLA) models. We'll dive into the emerging field of robot instruction following—what it means, and how recent research is shaping its future. I will present insights from my 2024 work on natural language-based robot instruction following and connect it to more recent advancements driving progress in this domain.

About the Speaker

Shreya Sharma is a Research Engineer at Reality Labs, Meta, where she works on photorealistic human avatars for AR/VR applications. She holds a bachelor’s degree in Computer Science from IIT Delhi and a master’s in Robotics from Carnegie Mellon University. Shreya is also a member of the inaugural 2023 cohort of the Quad Fellowship. Her research interests lie at the intersection of robotics and vision foundation models.

Farming with CLIP: Foundation Models for Biodiversity and Agriculture

Using open-source tools, we will explore the power and limitations of foundation models in agriculture and biodiversity applications. Leveraging the BIOTROVE dataset. The largest publicly accessible biodiversity dataset curated from iNaturalist, we will showcase real-world use cases powered by vision-language models trained on 40 million captioned images. We focus on understanding zero-shot capabilities, taxonomy-aware evaluation, and data-centric curation workflows.

We will demonstrate how to visualize, filter, evaluate, and augment data at scale. This session includes practical walkthroughs on embedding visualization with CLIP, dataset slicing by taxonomic hierarchy, identification of model failure modes, and building fine-tuned pest and crop monitoring models. Attendees will gain insights into how to apply multi-modal foundation models for critical challenges in agriculture, like ecosystem monitoring in farming.

About the Speaker

Paula Ramos has a PhD in Computer Vision and Machine Learning, with more than 20 years of experience in the technological field. She has been developing novel integrated engineering technologies, mainly in Computer Vision, robotics, and Machine Learning applied to agriculture, since the early 2000s in Colombia. During her PhD and Postdoc research, she deployed multiple low-cost, smart edge & IoT computing technologies, such as farmers, that can be operated without expertise in computer vision systems. The central objective of Paula’s research has been to develop intelligent systems/machines that can understand and recreate the visual world around us to solve real-world needs, such as those in the agricultural industry.

Multi-modal AI in Medical Edge and Client Device Computing

In this live demo, we explore the transformative potential of multi-modal AI in medical edge and client device computing, focusing on real-time inference on a local AI PC. Attendees will witness how users can upload medical images, such
as X-Rays, and ask questions about the images to the AI model. Inference is executed locally on Intel's integrated GPU and NPU using OpenVINO, enabling developers without deep AI experience to create generative AI applications.

About the Speaker

Helena Klosterman is an AI Engineer at Intel, based in the Netherlands, Helena enables organizations to unlock the potential of AI with OpenVINO, Intel's AI inference runtime. She is passionate about democratizing AI, developer experience, and bridging the gap between complex AI technology and practical applications.

The Business of AI

The talk will focus on the importance of clearly defining a specific problem and a use case, how to quantify the potential benefits of an AI solution in terms of measurable outcomes, evaluating technical feasibility in terms of technical challenges and limitations of implementing an AI solution, and envisioning the future of enterprise AI.

About the Speaker

Milica Cvetkovic is an AI engineer and consultant driving the development and deployment of production-ready AI systems for diverse organizations. Her expertise spans custom machine learning, generative AI, and AI operationalization. With degrees in mathematics and statistics, she possesses a decade of experience in education and edtech, including curriculum design and machine learning instruction for technical and non-technical audiences. Prior to Google, Milica held a data scientist role in biotechnology and has a proven track record of advising startups, demonstrating a deep understanding of AI's practical application.
4 attendees from this group