Aller au contenu

Aug 29 - Visual Agents Workshop Part 3: Teaching Machines to See and Click

Événement de réseau
113 participants de 44 groupes hébergeant
Aug 29 - Visual Agents Workshop Part 3: Teaching Machines to See and Click

Détails

Welcome to the three part Visual Agents Workshop virtual series...your hands on opportunity to learn about visual agents - how they work, how to develop them and how to fine-tune them.

Date and Time

Aug 29, 2025 at 9 AM Pacific

Register for the Zoom

Part 3: Teaching Machines to See and Click - Model Finetuning

From Foundation Models to GUI Specialists

Foundation models, such as Qwen2.5-VL, demonstrate impressive visual understanding, but they require specialized training to master GUI interactions. In this final session, you'll transform a general-purpose vision-language model into a GUI specialist that can navigate interfaces with human-like precision.

We'll explore modern fine-tuning strategies specifically designed for GUI tasks, from selecting the right architecture to handling the unique challenges of coordinate prediction and multi-step reasoning. You'll implement training pipelines that can handle the diverse formats and platforms in your dataset, evaluate models on metrics that actually matter for GUI automation, and deploy your trained model in a real-world testing environment.

About the Instructor

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.

Photo of Paris AI, Machine Learning and Computer Vision Meetup group
Paris AI, Machine Learning and Computer Vision Meetup
Afficher d'autres événements
GRATUIT