This Meetup is sponsored by Voxel51, the lead maintainers of the open source FiftyOne computer vision toolset. To learn more, visit the FiftyOne project page on GitHub.

Предстоящие мероприятия (4+)

Смотреть все

Сетевое мероприятие
137 участников из нескольких групп (36)
чт, 29 мая 2025 г., 16:00 UTCMay 29 - Best of WACV 2025
Участникам доступна ссылка
This is a virtual event taking place on May 29, 2025 at 9 AM Pacific.

Register for the Zoom

Welcome to the Best of WACV 2025 virtual series that highlights some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) is the premier international computer vision event comprising the main conference and several co-located workshops and tutorials.

DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models

Given a small number of images of a subject, personalized image generation techniques can fine-tune large pre-trained text-to-image diffusion models to generate images of the subject in novel contexts, conditioned on text prompts. In doing so, a trade-off is made between prompt fidelity, subject fidelity and diversity. As the pre-trained model is fine-tuned, earlier checkpoints synthesize images with low subject fidelity but high prompt fidelity and diversity. In contrast, later checkpoints generate images with low prompt fidelity and diversity but high subject fidelity. This inherent trade-off limits the prompt fidelity, subject fidelity and diversity of generated images. In this work, we propose DreamBlend to combine the prompt fidelity from earlier checkpoints and the subject fidelity from later checkpoints during inference. We perform a cross attention guided image synthesis from a later checkpoint, guided by an image generated by an earlier checkpoint, for the same prompt. This enables generation of images with better subject fidelity, prompt fidelity and diversity on challenging prompts, outperforming state-of-the-art fine-tuning methods.

Paper: DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models

About the Speaker

Shwetha Ram is an Applied Scientist at Amazon, where she focuses on advancing multimodal capabilities for Rufus, Amazon’s generative AI-powered conversational shopping assistant. Her work has contributed to a range of innovative initiatives across Amazon, including Lab126, Scout (the autonomous sidewalk delivery robot), and M5 (Amazon’s foundation models). Prior to joining Amazon, Shwetha was part of the Image Technology Incubation team at Dolby Laboratories, where she explored emerging opportunities for Dolby in AR/ VR and immersive media technologies.

Robust Multi-Class Anomaly Detection under Domain Shift

Robust multi-class anomaly detection under domain shift is a fundamental challenge in real-world scenarios, where detectors should distinguish different types of anomalies despite significant distribution shifts. Traditional approaches often struggle to generalize across domains and handle inter-class interference. ROADS addresses these limitations through a prompt-driven framework that combines a hierarchical class-aware prompt mechanism with a domain adapter to jointly encode discriminative, class-specific prompts and learn domain-invariant representations. Extensive evaluations on the MVTec-AD and VISA datasets show that ROADS achieves superior performance in both anomaly detection and localization, particularly in out-of-distribution settings.

Paper: ROADS: Robust Prompt-driven Multi-Class Anomaly Detection under Domain Shift

About the Speaker

Hossein Kashiani is a fourth-year Ph.D. student at Clemson University. His research focuses on developing generalizable and trustworthy AI systems, with publications in top venues such as CVPR (2025), WACV (2025), ICIP, and TBIOM. His work spans diverse applications, including anomaly detection, media forensics, biometrics, healthcare, and visual perception.

What Remains Unsolved in Computer Vision? Rethinking the Boundaries of State-of-the-Art

Despite rapid progress and increasingly powerful models, computer vision still struggles with a range of foundational challenges. This talk revisits the “blind spots” of state-of-the-art vision systems, focusing on problems that remain difficult in real-world applications. I will share insights from recent work on multi-object tracking—specifically cases involving prolonged occlusions, identity switches, and visually indistinguishable subjects such as identical triplets in motion. Through examples from DragonTrack and other mehtods, I’ll explore why these problems persist and what they reveal about the current limits of our models. Ultimately, this talk invites us to look beyond benchmark scores and rethink how we define progress in visual perception.

About the Speaker

Bishoy Galoaa is an incoming PhD student in Electrical and Computer Engineering at Northeastern University, under the supervision of Prof. Sarah Ostadabbas. His research centers on multi-object tracking and scene understanding in complex environments, with a focus on problems that challenge the assumptions of current deep learning models.

LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living

Current Large Language Vision Models trained on web videos perform well in general video understanding but struggle with fine-grained details, complex human-object interactions (HOI), and view-invariant representation learning essential for Activities of Daily Living (ADL). In this talk, I will introduce a foundation model: LLAVIDAL catered towards understanding ADL and the tricks to train such models.

Paper: LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living

About the Speaker

Srijan Das is an Assistant Professor in the Department of Computer Science at the University of North Carolina at Charlotte. At UNC Charlotte, he is working on Video Representation Learning, and Robotic Vision. He is a member of the AI4Health Center and one of the founding members of the Charlotte Machine Learning Lab (CharMLab) at UNC Charlotte.
3 участников from this group
Сетевое мероприятие
122 участников из нескольких групп (36)
пт, 30 мая 2025 г., 16:00 UTCMay 30 - Best of WACV 2025
Участникам доступна ссылка
This is a virtual event taking place on May 29, 2025 at 9 AM Pacific.

Register for the Zoom

Welcome to the Best of WACV 2025 virtual series that highlights some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) is the premier international computer vision event comprising the main conference and several co-located workshops and tutorials.

Iris Recognition for Infants

Non-invasive, efficient, physical token-less, accurate, and stable identification methods for newborns may prevent baby swapping at birth, limit baby abductions, and improve post-natal health monitoring across geographies, within both formal (e.g., hospitals) and informal (e.g., humanitarian and fragile settings) health sectors. This talk explores the feasibility of applying iris recognition as a biometric identifier for 4-6 week old infants.

About the Speaker

Rasel Ahmed Bhuiyan is a fourth-year PhD student at the University of Notre Dame, supervised by Adam Czajka. His research focuses on iris recognition at life extremes, specifically infants and post-mortem cases.

Advancing Autonomous Simulation with Generative AI

Autonomous vehicle (AV) technology, including self-driving systems, is rapidly advancing but is hindered by the limited availability of diverse and realistic driving data. Traditional data collection methods, which deploy sensor-equipped vehicles to capture real-world scenarios, are costly, time-consuming, and risk-prone, especially for rare but critical edge cases.

We introduce the Autonomous Temporal Diffusion Model (AutoTDM), a foundation model that generates realistic, physics-consistent driving videos. By leveraging natural language prompts and integrating semantic sensory data inputs like depth maps, edge detection, segmentation maps, and camera positions, AutoTDM produces high-quality, consistent driving scenes that are controllable and adaptable to various simulation needs. This capability is crucial for developing robust autonomous navigation systems, as it allows for the simulation of long-duration driving scenarios under diverse conditions.

AutoTDM offers a scalable, cost-effective solution for training and validating autonomous systems, enhancing safety and accelerating industry advancements by simulating comprehensive driving scenarios in a controlled virtual environment, which marks a significant leap forward in autonomous vehicle development.

About the Speaker

Xiangyu Bai is a second-year PhD candidate at ACLab, Northeastern University, specializing in generative AI and computer vision, with a focus on autonomous simulation. His research centers on developing innovative, physics-aware generative vision frameworks that enhance simulation systems to provide realistic, scalable solutions for autonomous navigation. He has authored six papers in top-tier conferences and journals, including three as first author, highlighting his significant contributions to the field.

Classification of Infant Sleep–Wake States from Natural Overnight In-Crib Sleep Videos

Infant sleep plays a vital role in brain development, but conventional monitoring techniques are often intrusive or require extensive manual annotation, limiting their practicality. To address this, we develop a deep learning model that classifies infant sleep–wake states from 90-second video segments using a two-stream spatiotemporal architecture that fuses RGB frames with optical flow features. The model achieves over 80% precision and recall on clips dominated by a single state and demonstrates robust performance on more heterogeneous clips, supporting future applications in sleep segmentation and sleep quality assessment from full overnight recordings.

About the Speaker

Shayda Moezzi is pursuing a PhD in Computer Engineering at Northeastern University in the Augmented Cognition Lab, under the guidance of Professor Sarah Ostadabbas. Her current research focuses on computer vision techniques for video segmentation.

Leveraging Vision Language Models for Specialized Agricultural Tasks

Traditional plant stress phenotyping requires experts to annotate thousands of samples per task – a resource-intensive process limiting agricultural applications. We demonstrate that state-of-the-art Vision Language Models (VLMs) can achieve F1 scores of 73.37% across 12 diverse plant stress tasks using just a handful of annotated examples.
This work establishes how general-purpose VLMs with strategic few-shot learning can dramatically reduce annotation burden while maintaining accuracy, transforming specialized agricultural visual tasks.

About the Speaker

Muhammad Arbab Arshad is a Ph.D. candidate in Computer Science at Iowa State University, affiliated with AIIRA. His research focuses on Generative AI and Large Language Models, developing methodologies to leverage state-of-the-art AI models with limited annotated data for specialized tasks.
3 участников from this group
Сетевое мероприятие
90 участников из нескольких групп (37)
вт, 17 июн. 2025 г., 16:00 UTCJune 17 - Databricks Mosaic AI + FiftyOne: Scaling Physical AI
Участникам доступна ссылка
When and Where

June 17, 2025 | 9:00 AM Pacific

Virtually over Zoom. Sign up!

About the Workshop

Ever tried to find something specific in your image or video datasets that weren’t already labeled? It’s always been a frustrating and time-consuming experience.

Until now.

When you combine the FiftyOne computer vision toolkit with Mosaic AI from Databricks, you unlock lightning-fast vector search for the millions of images and videos in your data lake – to find exactly what you are looking for, even if there’s no label for it.

In this technical session with machine learning engineer Dan Gural, he’ll show you how the Mosaic AI integration works inside FiftyOne, featuring real-world mobility and autonomous use cases where you search massive, state-of-the-art datasets in just seconds.

If you’re working with edge cases, building smarter datasets, or just curious about what Mosaic AI vector search can do for you, this one’s for you.
1 участник from this group
Сетевое мероприятие
30 участников из нескольких групп (38)
ср, 18 июн. 2025 г., 16:00 UTCJune 18 - Getting Started with FiftyOne Workshop
Участникам доступна ссылка
When and Where

June 18, 2025 | 9:00 – 10:30 AM Pacific

Virtually over Zoom. Sign up!

About the Workshop

Want greater visibility into the quality of your computer vision datasets and models? Then join us for this free 90-minute, hands-on workshop to learn how to leverage the open source FiftyOne computer vision toolset.
At the end of the workshop you’ll be able to:

Object detection

Embeddings

Mistakenness

Deduplication

This workshop will explore the importance of taking a data-centric approach to computer vision workflows. We will start with importing and exploring visual data, then move to querying and filtering. Next, we’ll look at ways to extend FiftyOne’s functionality and simplify tasks using plugins and native integrations. We’ll generate candidate ground truth labels, and then wrap things up by evaluating the results of fine tuning a foundational model.

Prerequisites: working knowledge of Python and basic computer vision concepts.

All attendees will get access to the tutorials, videos, and code examples used in the workshop

About the Instructor

Antonio Rueda-Toicen, an AI Engineer in Berlin, has extensive experience in deploying machine learning models and has taught over 300 professionals. He is currently a Research Scientist at the Hasso Plattner Institute. Since 2019, he has organized the Berlin Computer Vision Group and taught at Berlin’s Data Science Retreat. He specializes in computer vision, cloud technologies, and machine learning. Antonio is also a certified instructor of deep learning and diffusion models in NVIDIA’s Deep Learning Institute.
1 участник from this group

Прошедшие мероприятия (20)

Смотреть все

Сетевое мероприятие
315 участников из нескольких групп (36)
чт, 22 мая 2025 г., 17:00 UTCMay 22 - AI, ML and Computer Vision Meetup
Это мероприятие уже прошло
6 участников from this group+1

Barcelona AI Machine Learning and Computer Vision Meetup

О нас

Предстоящие мероприятия (4+)

Прошедшие мероприятия (20)

Ссылки группы

Связанные темы