May 1 - Best of WACV (Day 2)
60 attendees from 48 groups hosting
Hosted by BayNode - The Bay Area Node.js Meetup
Details
Welcome to the Best of WACV series, your virtual pass to some of the groundbreaking research, insights, and innovations that defined this year’s conference. Live streaming from the authors to you.
Date, Time and Location
May 01, 2026
9 AM - 11 AM Pacific
Online. Register for the Zoom!
Perceptually Guided 3DGS Streaming and Rendering for Mixed Reality
Recent advances in 3D Gaussian Splatting (3DGS) enable high-quality rendering but fall short of mixed reality's demanding requirements for high refresh rates, stereo viewing, and limited compute budgets. We propose a perception-guided, continuous level-of-detail framework that exploits human visual system limitations through a lightweight, gaze-contingent model to predict and adaptively modulate rendering quality across the visual field, maximizing perceived quality under compute constraints.
Combined with an edge-cloud collaborative rendering framework for untethered MR devices, our method achieves superior computational efficiency with minimal perceptual quality loss compared to vanilla and foveated baselines, validated through objective metrics and user studies.
About the Speaker
Sai Harsha Mupparaju is an MS Computer Science student at NYU working in the Immersive Computing Lab with Prof. Qi Sun, where he focuses on 3D Gaussian Splatting, neural rendering, and perceptual VR/MR systems. He previously earned his undergraduate degree from BITS Pilani and conducted research at the Indian Institute of Science (IISc). His research has been published at IEEE WACV 2026, ACM Transactions on Graphics, and ACM SIGGRAPH 2024.
SAVIOR: Sample-efficient Adaptation of Vision-Language Models for OCR Representation
OCR pipelines and vision-language models systematically underperform on document patterns critical to financial workflows, such as vertical text, logo-embedded vendor names, degraded scans, and complex multi-column layouts. While underrepresented in public datasets, these patterns constitute a substantial portion of real-world failure cases.
We introduce SAVIOR, a sample-efficient data curation methodology that targets such high-impact failure scenarios to adapt vision-language models for robust financial OCR, and PaIRS, a structure-aware evaluation metric that measures layout fidelity by comparing pairwise spatial relationships between tokens. When fine-tuned with SAVIOR-Train, Qwen2.5-VL-Instruct demonstrates robust financial OCR performance, outperforming both open and closed-source baselines including GPT-4o, Mistral-OCR, PaddleOCR-VL, and DeepSeek-OCR.
About the Speaker
Akshata Bhat is an AI/ML Research Engineer at Hyprbots Inc. Her research interests include multimodal learning, vision-language models, and large-scale document understanding systems.
SynthForm: Towards a DLA-free E2E Form understanding model
We present SynthForm-3k, the first large-scale publicly available dataset of synthetically perturbed forms, comprising 3,417 samples across six domains: taxation, immigration, finance, healthcare, dental, and insurance. Ground-truth Markdown is constructed via an intermediate HTML representation generated by GPT-5 under high-reasoning inference, followed by deterministic HTML-to-Markdown conversion and scan-like perturbations (dust, scan lines, blur, rotation) that simulate real-world faxed and scanned documents.
We further introduce SynthForm-VL, a family of 2B, 4B, and 8B models obtained via full-parameter supervised fine-tuning of Qwen3-VL on this dataset. All three variants outperform their respective baselines, yielding ANLS improvements of +5.8, +9.3, and +10.3, with the fine-tuned 2B model exceeding the performance of the 4× larger Qwen3-VL-8B baseline — demonstrating that targeted domain adaptation on perturbation-robust data offers a more favorable cost–performance tradeoff than scale alone for structured form understanding.
About the Speaker
Andre Fu is an ML researcher and founder whose work spans multimodal learning, GPU inference infrastructure, and document understanding, with prior publications at NeurIPS, ICCV, CVPR, and WACV. He has worked in document processing and ML for the last 4 years, specializing in InsurTech, FinTech & HealthTech use cases.

