July 10 - Best of CVPR

Details
Join us for a series of virtual events focused on the most interesting and groundbreaking research presented at this year's CVPR conference!
When
July 10, 2025 at 9 AM Pacific
Where
Online. Register for the Zoom!
OFER : Occluded Face Expression Reconstruction
Reconstructing 3D face models from a single image is an inherently ill-posed problem, which becomes even more challenging in the presence of occlusions where multiple reconstructions can be equally valid. Despite the ubiquity of the problem, very few methods address its multi-hypothesis nature.
In this paper we introduce OFER, a novel approach for single-image 3D face reconstruction that can generate plausible, diverse, and expressive 3D faces by training two diffusion models to generate a shape and expression coefficients of face parametric model, conditioned on the input image. To maintain consistency across diverse expressions, the challenge is to select the best matching shape. To achieve this, we propose a novel ranking mechanism that sorts the outputs of the shape diffusion network based on predicted shape accuracy scores.
Paper: OFER: Occluded Face Expression Reconstruction
About the Speaker
Pratheba Selvaraju has a PhD from the University of Massachusetts, Amherst. Currently researcher at Max Planck Institute β Perceived systems. Research Interest is in 3D reconstruction and modeling problem, geometry processing and generative modeling. Currently also exploring the space of virtual try-ons combining vision and 3D techniques.
SmartHome-Bench: Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal LMMs
Video anomaly detection is crucial for ensuring safety and security, yet existing benchmarks overlook the unique context of smart home environments. We introduce SmartHome-Bench, a dataset of 1,203 smart home videos annotated according to a novel taxonomy of seven anomaly categories, such as Wildlife, Senior Care, and Baby Monitoring. We evaluate state-of-the-art closed- and open-source multimodal LLMs with various prompting techniques, revealing significant performance gaps. To address these limitations, we propose the Taxonomy-Driven Reflective LLM Chain (TRLC), which boosts detection accuracy by 11.62%.
About the Speaker
Xinyi Zhao is a fourth-year PhD student at the University of Washington, specializing in multimodal large language models and reinforcement learning for smart home applications. This work was conducted during her summer 2024 internship at Wyze Labs, Inc.
Interactive Medical Image Analysis with Concept-based Similarity Reasoning
What if you could tell an AI model exactly βπΈπ©π¦π³π¦ π΅π° π§π°π€πΆπ΄β and βπΈπ©π¦π³π¦ π΅π° πͺπ¨π―π°π³π¦β on a medical image? Our work enables radiologists to interactively guide AI models at test time for more transparent and trustworthy decision-making. This paper introduces the novel Concept-based Similarity Reasoning network (CSR), which offers (i) patch-level prototype with intrinsic concept interpretation, and (ii) spatial interactivity. First, the proposed CSR provides localized explanation by grounding prototypes of each concept on image regions. Second, our model introduces novel spatial-level interaction, allowing doctors to engage directly with specific image areas, making it an intuitive and transparent tool for medical imaging.
Paper: Interactive Medical Image Analysis with Concept-based Similarity Reasoning
About the Speaker
Huy Ta is a PhD student at the Australian Institute for Machine Learning, The University of Adelaide, specializing in Explainable and Interactive AI for medical imaging. He brings with him four years of industry experience in medical imaging AI prior to embarking on his doctoral studies.
Multi-view Anomaly Detection: From Static to Probabilistic Modelling
The advent of 3D Gaussian Splatting has revolutionized and re-vitalized the interest in multi-view image data. Applications of these techniques to fields such as anomaly detection have been a logical next step. However, some of the limitations of these models may warrant a return to already applied probabilistic techniques. New approaches, difficulties and possibilities in this field will be explored in this talk.
About the Speaker
Mathis Kruse is a PhD student in the group of Bodo Rosenhahn in Hanover, Germany, where he studies anomaly detection (especially in images). He has a particular interest in multi-view data and its learning-based representations.


July 10 - Best of CVPR