July 2023 Computer Vision Meetup (Virtual - APAC)
81 attendees from 12 groups hosting
Details
Zoom Link
https://voxel51.com/computer-vision-events/july-20-meetup-apac/
MARLIN: Masked Autoencoder for Facial Video Representation LearnINg
This talk proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS). Our proposed framework, named MARLIN, is a facial video masked autoencoder, that learns highly robust and generic facial embeddings from abundantly available non-annotated web crawled facial videos. As a challenging auxiliary task, MARLIN reconstructs the spatio-temporal details of the face from the densely masked facial regions which mainly include eyes, nose, mouth, lips, and skin to capture local and global aspects that in turn help in encoding generic and transferable features. Through a variety of experiments on diverse downstream tasks, we demonstrate MARLIN to be an excellent facial video encoder as well as feature extractor, that performs consistently well across a variety of downstream tasks including FAR (1.13% gain over supervised benchmark), FER (2.64% gain over unsupervised benchmark), DFD (1.86% gain over unsupervised benchmark), LS (29.36% gain for Frechet Inception Distance), and even in low data regime.
Zhixi Cai is a Ph.D. student in the Data Science and Artificial Intelligence Department of Monash University IT Faculty, supervised by Dr. Munawar Hayat, Dr. Kalin Stefanov, and Dr. Abhinav Dhall.
Unleashing the Potential of Visual Data: Vector Databases in Computer Vision
Discover the game-changing role of vector databases in computer vision applications. These specialized databases excel at handling unstructured visual data, thanks to their robust support for embeddings and lightning-fast similarity search. Join us as we explore advanced indexing algorithms and showcase real-world examples in healthcare, retail, finance, and more using the FiftyOne engine combined with the Milvus vector database. See how vector databases unlock the full potential of your visual data.
Filip Haltmayer is a Software Engineer at Zilliz working in both software and community development.
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data
Current perceptual similarity metrics compare images in terms of their low-level colors and textures, but fail to capture mid-level similarities in image layout, object pose, and semantic content. To address this gap, we introduce NIGHTS, a synthetic image dataset labeled with human similarity judgments, and DreamSim, a metric tuned to better align with human perception. We analyze how our metric is affected by different visual attributes, and show that it outperforms prior learned metrics and recent large vision models on retrieval and reconstruction tasks.
Stephanie Fu recently graduated from MIT with bachelors degrees in computer science and music and an M.Eng in computer science. Her research interests include computer vision, representation learning, and the connections between human and machine perception.
Shobhita Sundaram is a PhD student at MIT in computer science. She is interested in computer vision, particularly generative models and representation learning. Previously she obtained her bachelors in computer science and mathematics from MIT while researching biologically-inspired models for computer vision.
Netanel Tamir is an MSc student at the Weizmann Institute of Science, studying computer science. He’s interested in computer vision, representation learning and psychophysics.
