July 10 - Best of CVPR

Name: July 10 - Best of CVPR
Start: 2025-07-10T12:00:00-04:00
End: 2025-07-10T14:00:00-04:00

Network event

63 attendees from 37 groups hosting

Hosted By NYC Computer Vision in Production

public group

Details

Join us for a series of virtual events focused on the most interesting and groundbreaking research presented at this year's CVPR conference!

When

July 10, 2025 at 9 AM Pacific

Where

Online. Register for the Zoom!

OFER : Occluded Face Expression Reconstruction

Reconstructing 3D face models from a single image is an inherently ill-posed problem, which becomes even more challenging in the presence of occlusions where multiple reconstructions can be equally valid. Despite the ubiquity of the problem, very few methods address its multi-hypothesis nature.

In this paper we introduce OFER, a novel approach for single-image 3D face reconstruction that can generate plausible, diverse, and expressive 3D faces by training two diffusion models to generate a shape and expression coefficients of face parametric model, conditioned on the input image. To maintain consistency across diverse expressions, the challenge is to select the best matching shape. To achieve this, we propose a novel ranking mechanism that sorts the outputs of the shape diffusion network based on predicted shape accuracy scores.

Paper: OFER: Occluded Face Expression Reconstruction

About the Speaker

Pratheba Selvaraju has a PhD from the University of Massachusetts, Amherst. Currently researcher at Max Planck Institute – Perceived systems. Research Interest is in 3D reconstruction and modeling problem, geometry processing and generative modeling. Currently also exploring the space of virtual try-ons combining vision and 3D techniques.

SmartHome-Bench: Benchmark for Video Anomaly Detection in Smart Homes Using Multi-Modal LMMs

Video anomaly detection is crucial for ensuring safety and security, yet existing benchmarks overlook the unique context of smart home environments. We introduce SmartHome-Bench, a dataset of 1,203 smart home videos annotated according to a novel taxonomy of seven anomaly categories, such as Wildlife, Senior Care, and Baby Monitoring. We evaluate state-of-the-art closed- and open-source multimodal LLMs with various prompting techniques, revealing significant performance gaps. To address these limitations, we propose the Taxonomy-Driven Reflective LLM Chain (TRLC), which boosts detection accuracy by 11.62%.

About the Speaker

Xinyi Zhao is a fourth-year PhD student at the University of Washington, specializing in multimodal large language models and reinforcement learning for smart home applications. This work was conducted during her summer 2024 internship at Wyze Labs, Inc.

Interactive Medical Image Analysis with Concept-based Similarity Reasoning

What if you could tell an AI model exactly “𝘸𝘩𝘦𝘳𝘦 𝘵𝘰 𝘧𝘰𝘤𝘶𝘴” and “𝘸𝘩𝘦𝘳𝘦 𝘵𝘰 𝘪𝘨𝘯𝘰𝘳𝘦” on a medical image? Our work enables radiologists to interactively guide AI models at test time for more transparent and trustworthy decision-making. This paper introduces the novel Concept-based Similarity Reasoning network (CSR), which offers (i) patch-level prototype with intrinsic concept interpretation, and (ii) spatial interactivity. First, the proposed CSR provides localized explanation by grounding prototypes of each concept on image regions. Second, our model introduces novel spatial-level interaction, allowing doctors to engage directly with specific image areas, making it an intuitive and transparent tool for medical imaging.

Paper: Interactive Medical Image Analysis with Concept-based Similarity Reasoning

About the Speaker

Huy Ta is a PhD student at the Australian Institute for Machine Learning, The University of Adelaide, specializing in Explainable and Interactive AI for medical imaging. He brings with him four years of industry experience in medical imaging AI prior to embarking on his doctoral studies.

Multi-view Anomaly Detection: From Static to Probabilistic Modelling

The advent of 3D Gaussian Splatting has revolutionized and re-vitalized the interest in multi-view image data. Applications of these techniques to fields such as anomaly detection have been a logical next step. However, some of the limitations of these models may warrant a return to already applied probabilistic techniques. New approaches, difficulties and possibilities in this field will be explored in this talk.

About the Speaker

Mathis Kruse is a PhD student in the group of Bodo Rosenhahn in Hanover, Germany, where he studies anomaly detection (especially in images). He has a particular interest in multi-view data and its learning-based representations.

Events in Artificial Intelligence

Computer Vision Machine Learning Data Science Open Source

NYC Computer Vision in Production

See more events

NYC Computer Vision in Production

No ratings yet

Online event

Link visible for attendees

NYC Computer Vision in Production

public group

July 10 - Best of CVPR

FREE