
Details
This is a virtual event taking place on May 29, 2025 at 9 AM Pacific.
Welcome to the Best of WACV 2025 virtual series that highlights some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) is the premier international computer vision event comprising the main conference and several co-located workshops and tutorials.
Iris Recognition for Infants
Non-invasive, efficient, physical token-less, accurate, and stable identification methods for newborns may prevent baby swapping at birth, limit baby abductions, and improve post-natal health monitoring across geographies, within both formal (e.g., hospitals) and informal (e.g., humanitarian and fragile settings) health sectors. This talk explores the feasibility of applying iris recognition as a biometric identifier for 4-6 week old infants.
About the Speaker
Rasel Ahmed Bhuiyan is a fourth-year PhD student at the University of Notre Dame, supervised by Adam Czajka. His research focuses on iris recognition at life extremes, specifically infants and post-mortem cases.
Advancing Autonomous Simulation with Generative AI
Autonomous vehicle (AV) technology, including self-driving systems, is rapidly advancing but is hindered by the limited availability of diverse and realistic driving data. Traditional data collection methods, which deploy sensor-equipped vehicles to capture real-world scenarios, are costly, time-consuming, and risk-prone, especially for rare but critical edge cases.
We introduce the Autonomous Temporal Diffusion Model (AutoTDM), a foundation model that generates realistic, physics-consistent driving videos. By leveraging natural language prompts and integrating semantic sensory data inputs like depth maps, edge detection, segmentation maps, and camera positions, AutoTDM produces high-quality, consistent driving scenes that are controllable and adaptable to various simulation needs. This capability is crucial for developing robust autonomous navigation systems, as it allows for the simulation of long-duration driving scenarios under diverse conditions.
AutoTDM offers a scalable, cost-effective solution for training and validating autonomous systems, enhancing safety and accelerating industry advancements by simulating comprehensive driving scenarios in a controlled virtual environment, which marks a significant leap forward in autonomous vehicle development.
About the Speaker
Xiangyu Bai is a second-year PhD candidate at ACLab, Northeastern University, specializing in generative AI and computer vision, with a focus on autonomous simulation. His research centers on developing innovative, physics-aware generative vision frameworks that enhance simulation systems to provide realistic, scalable solutions for autonomous navigation. He has authored six papers in top-tier conferences and journals, including three as first author, highlighting his significant contributions to the field.
Classification of Infant Sleep–Wake States from Natural Overnight In-Crib Sleep Videos
Infant sleep plays a vital role in brain development, but conventional monitoring techniques are often intrusive or require extensive manual annotation, limiting their practicality. To address this, we develop a deep learning model that classifies infant sleep–wake states from 90-second video segments using a two-stream spatiotemporal architecture that fuses RGB frames with optical flow features. The model achieves over 80% precision and recall on clips dominated by a single state and demonstrates robust performance on more heterogeneous clips, supporting future applications in sleep segmentation and sleep quality assessment from full overnight recordings.
About the Speaker
Shayda Moezzi is pursuing a PhD in Computer Engineering at Northeastern University in the Augmented Cognition Lab, under the guidance of Professor Sarah Ostadabbas. Her current research focuses on computer vision techniques for video segmentation.
Leveraging Vision Language Models for Specialized Agricultural Tasks
Traditional plant stress phenotyping requires experts to annotate thousands of samples per task – a resource-intensive process limiting agricultural applications. We demonstrate that state-of-the-art Vision Language Models (VLMs) can achieve F1 scores of 73.37% across 12 diverse plant stress tasks using just a handful of annotated examples.
This work establishes how general-purpose VLMs with strategic few-shot learning can dramatically reduce annotation burden while maintaining accuracy, transforming specialized agricultural visual tasks.
About the Speaker
Muhammad Arbab Arshad is a Ph.D. candidate in Computer Science at Iowa State University, affiliated with AIIRA. His research focuses on Generative AI and Large Language Models, developing methodologies to leverage state-of-the-art AI models with limited annotated data for specialized tasks.

May 30 - Best of WACV 2025