Skip to content

Details

This hands-on workshop provides a comprehensive introduction to building and evaluating visual agents for GUI automation using modern tools and techniques.

Date, Time and Location

April 9, 2026 at 9 AM Pacific
Online. Register for the Zoom

Visual agents that can understand and interact with graphical user interfaces represent a transformative frontier in AI automation. These systems combine computer vision, natural language understanding, and spatial reasoning to enable machines to navigate complex interfaces—from web applications to desktop software—just as humans do. However, building robust GUI agents requires careful attention to dataset curation, model evaluation, and iterative improvement workflows.

Participants will learn how to leverage FiftyOne, an open-source toolkit for dataset curation and computer vision workflows, to build production-ready GUI agent systems.

What You'll Learn:

  • Dataset Creation & Management: How to structure, annotate, and load GUI interaction datasets using the COCO4GUI standardized format
  • Data Exploration & Analysis: Using FiftyOne's interactive interface to visualize datasets, analyze action distributions, and understand annotation patterns
  • Multimodal Embeddings: Computing embeddings for screenshots and UI element patches to enable similarity search and retrieval
  • Model Inference: Running state-of-the-art models like Microsoft's GUI-Actor to predict interaction points from natural language instructions
  • Performance Evaluation: Measuring model accuracy using standard metrics and normalized click distance to assess localization precision
  • Failure Analysis: Investigating model failures through attention maps, error pattern analysis, and systematic debugging workflows
  • Data-Driven Improvement: Tagging samples based on error types (attention misalignment vs. localization errors) to prioritize fine-tuning efforts
  • Synthetic Data Generation: Using FiftyOne plugins to augment training data with synthetic task descriptions and variations

About the Speaker

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.

Related topics

Artificial Intelligence
Computer Vision
Machine Learning
Data Science
Open Source

Sponsors

Versatile

Versatile

Hosting April 2021 event

Cloudinary

Cloudinary

Sponsoring Sep 2018 meetup

Healthy.io

Healthy.io

Sponsoring Aug 2018 meetup

LEO pharma

LEO pharma

Sponsoring our July 2018 meetup

You may also like