Aug 15 - Visual Agent Workshop Part 1: Navigating the GUI Agent Landscape

Détails
Welcome to the three part Visual Agents Workshop virtual series...your hands on opportunity to learn about visual agents - how they work, how to develop them and how to fine-tune them.
Date and Time
Aug 15, 2025 at 9 AM Pacific
Part 1: Navigating the GUI Agent Landscape
Understanding the Foundation Before Building
The GUI agent field is evolving rapidly, but success requires an understanding of what came before. In this opening session, we'll map the terrain of GUI agent research—from the early days of MiniWoB's simplified environments to today's complex, multimodal systems tackling real-world applications. You'll discover why standard vision models fail catastrophically on GUI tasks, explore the annotation bottlenecks that make GUI datasets so expensive to create, and understand the platform fragmentation that makes "click a button" mean twenty different things across datasets.
We'll dissect the most influential datasets (Mind2Web, AITW, Rico) and models that have shaped the field, examining their strengths, limitations, and the research gaps they reveal. By the end, you'll have a clear picture of where GUI agents excel, where they struggle, and, most importantly, where the opportunities lie for your own contributions.
About the Instructor
Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.

Aug 15 - Visual Agent Workshop Part 1: Navigating the GUI Agent Landscape