Skip to content

New benchmarks in vision-language models for real-world use. Google Research

Photo of Sophia Aryan
Hosted By
Sophia A.
New benchmarks in vision-language models for real-world use. Google Research

Details

In this talk, Yonatan Bitton, a Research Scientist at Google Research, will present VisIT-Bench and WHOOPS!, two benchmarks designed to elevate the field of vision-language models. VisIT-Bench focuses on real-world applications and includes 592 test queries across 70 different 'instruction families.' It goes beyond traditional benchmarks like VQAv2 and COCO by covering tasks from simple object recognition to game playing and creative generation.

Notably, VisIT-Bench features a unique reference-free auto-evaluation method that aligns closely with human evaluations, highlighting that current top-performing models surpass a GPT-4 reference in only 27% of cases.

Meanwhile, WHOOPS! aims to test visual commonsense through the generation of deliberately unusual images. It introduces specialized tasks and includes zero-shot and end-to-end models to meet these challenges. Both benchmarks are designed to be dynamic and open for participation, encouraging ongoing development and evaluation in the field of vision-language models.

Photo of BuzzRobot group
BuzzRobot
See more events
Online event
This event has passed