[AI Alliance] How to Train Your LLM Web Agent: A Statistical Diagnosis
336 attendees from 115 groups hosting
Details
LLM-based web agents have recently made significant progress, but much of it has occurred in closed-source systems, widening the gap with open-source alternatives. Progress has been held back by two key challenges: first, a narrow focus on single-step tasks that overlooks the complexity of multi-step web interactions; and second, the high compute costs required to post-train LLM-based web agents.
To address this, we present the first statistically grounded study on compute allocation for LLM web-agent post-training. Our approach uses a two-stage pipeline, training a Llama 3.1 8B or QWEN 2.5 7B student to imitate a Llama 3.3 70B teacher or QWEN 2.5 72B via supervised fine-tuning (SFT), followed by on-policy reinforcement learning (GRPO).
We find this process highly sensitive to hyperparameter choices, making exhaustive sweeps impractical. To spare others from expensive trial-and-error, we sample 1,370 configurations and use bootstrapping to estimate effective hyperparameters. Our results show that combining SFT with on-policy RL consistently outperforms either approach alone on both WorkArena and MiniWob++. Further, this strategy requires only 55% of the compute to match the peak performance of pure SFT on MiniWob++, effectively pushing the compute-performance Pareto frontier, and is the only strategy that can close the gap with closed-source models.
Read the paper on ArXiv: How to Train Your LLM Web Agent: A Statistical Diagnosis (PDF)
About the speaker
I’m Massimo Caccia, Senior Research Scientist at ServiceNow Research, specializing in post-training methods for computer-use agents. I see computer use as the ultimate playground for testing agents, thanks to its ubiquity and diversity. My research involves conducting large-scale empirical studies to systematically evaluate trade-offs among different approaches and to develop practical know-how, with reinforcement learning being a particular focus.
As a core contributor to the web-agent research library ecosystem, I actively shape evaluation frameworks (BrowserGym, WorkArena) and development platforms (AgentLab). My goal is to bridge foundational research and scalable tools to advance the field.
Previously, I completed my Ph.D. at the Quebec Artificial Intelligence Institute (Mila) under Professor Laurent Charlin. During my doctoral studies, I collaborated with DeepMind’s Continual Learning team led by Marc’Aurelio Ranzato, Amazon’s team under Alex Smola, and ElementAI prior to its integration with ServiceNow.
My Ph.D. research focused on building agents capable of accumulating and transferring knowledge across tasks, drawing from continual learning, transfer learning, and meta-learning. My work explored applications in language, vision, and reinforcement learning, emphasizing improvements in data and compute efficiency.
About the AI Alliance
The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.
Join the community
Sign up for the AI Alliance newsletter (check the website footer) and join our new AI Alliance Discord.