Aug 29 - Visual Agents Workshop Part 3: Teaching Machines to See and Click

Name: Aug 29 - Visual Agents Workshop Part 3: Teaching Machines to See and Click
Start: 2025-08-29T18:00:00+02:00
End: 2025-08-29T19:00:00+02:00

Network event

356 participants de 44 groupes organisant

Animé par Paris AI, Machine Learning and Computer Vision Meetup

Paris AI, Machine Learning and Computer Vision Meetup

Détails

Welcome to the three part Visual Agents Workshop virtual series...your hands on opportunity to learn about visual agents - how they work, how to develop them and how to fine-tune them.

Date and Time

Aug 29, 2025 at 9 AM Pacific

Register for the Zoom

Part 3: Teaching Machines to See and Click - Model Finetuning

From Foundation Models to GUI Specialists

Foundation models, such as Qwen2.5-VL, demonstrate impressive visual understanding, but they require specialized training to master GUI interactions. In this final session, you'll transform a general-purpose vision-language model into a GUI specialist that can navigate interfaces with human-like precision.

We'll explore modern fine-tuning strategies specifically designed for GUI tasks, from selecting the right architecture to handling the unique challenges of coordinate prediction and multi-step reasoning. You'll implement training pipelines that can handle the diverse formats and platforms in your dataset, evaluate models on metrics that actually matter for GUI automation, and deploy your trained model in a real-world testing environment.

About the Instructor

Harpreet Sahota is a hacker-in-residence and machine learning engineer with a passion for deep learning and generative AI. He’s got a deep interest in RAG, Agents, and Multimodal AI.

Sujets connexes

Artificial Intelligence

Computer Vision

Machine Learning

Data Science

Open Source

Aug 29 - Visual Agents Workshop Part 3: Teaching Machines to See and Click

Paris AI, Machine Learning and Computer Vision Meetup

Détails

Sujets connexes

Vous aimerez peut-être aussi