Aug 29 - Visual Agents Workshop Part 3: Teaching Machines to See and Click

Details
Welcome to the three part Visual Agents Workshop virtual series...your hands on opportunity to learn about visual agents - how they work, how to develop them and how to fine-tune them.
Date and Time
Aug 29, 2025 at 9 AM Pacific
Part 3: Teaching Machines to See and Click - Model Finetuning
From Foundation Models to GUI Specialists
Foundation models, such as Qwen2.5-VL, demonstrate impressive visual understanding, but they require specialized training to master GUI interactions. In this final session, you'll transform a general-purpose vision-language model into a GUI specialist that can navigate interfaces with human-like precision.
We'll explore modern fine-tuning strategies specifically designed for GUI tasks, from selecting the right architecture to handling the unique challenges of coordinate prediction and multi-step reasoning. You'll implement training pipelines that can handle the diverse formats and platforms in your dataset, evaluate models on metrics that actually matter for GUI automation, and deploy your trained model in a real-world testing environment.
About the Instructor
Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.

Aug 29 - Visual Agents Workshop Part 3: Teaching Machines to See and Click