

What we’re about
Want to learn more about Data, AI & Automation?
This group is for you to share your thoughts on advancing the use technologies in the industry and stay connected with peers and subject matter experts. The group is also committed to cultivating a strong community of practitioners.
This is an IBM sponsored Data, AI & Automation Meetup group.
To see all meetups in this group: https://www.meetup.com/pro/ibm-community/
This is an IBM sponsored Meetup group geared towards business users, data scientists, data engineers, and ALL Data, AI & Automation enthusiasts and to interact and share knowledge with experts at IBM and in our extended community.
Sponsors
Upcoming events
1
- Network event•Online
[AI Alliance] How to Train Your LLM Web Agent: A Statistical Diagnosis
Online336 attendees from 115 groupsLLM-based web agents have recently made significant progress, but much of it has occurred in closed-source systems, widening the gap with open-source alternatives. Progress has been held back by two key challenges: first, a narrow focus on single-step tasks that overlooks the complexity of multi-step web interactions; and second, the high compute costs required to post-train LLM-based web agents.
To address this, we present the first statistically grounded study on compute allocation for LLM web-agent post-training. Our approach uses a two-stage pipeline, training a Llama 3.1 8B or QWEN 2.5 7B student to imitate a Llama 3.3 70B teacher or QWEN 2.5 72B via supervised fine-tuning (SFT), followed by on-policy reinforcement learning (GRPO).
We find this process highly sensitive to hyperparameter choices, making exhaustive sweeps impractical. To spare others from expensive trial-and-error, we sample 1,370 configurations and use bootstrapping to estimate effective hyperparameters. Our results show that combining SFT with on-policy RL consistently outperforms either approach alone on both WorkArena and MiniWob++. Further, this strategy requires only 55% of the compute to match the peak performance of pure SFT on MiniWob++, effectively pushing the compute-performance Pareto frontier, and is the only strategy that can close the gap with closed-source models.
Read the paper on ArXiv: How to Train Your LLM Web Agent: A Statistical Diagnosis (PDF)
About the speaker
I’m Massimo Caccia, Senior Research Scientist at ServiceNow Research, specializing in post-training methods for computer-use agents. I see computer use as the ultimate playground for testing agents, thanks to its ubiquity and diversity. My research involves conducting large-scale empirical studies to systematically evaluate trade-offs among different approaches and to develop practical know-how, with reinforcement learning being a particular focus.
As a core contributor to the web-agent research library ecosystem, I actively shape evaluation frameworks (BrowserGym, WorkArena) and development platforms (AgentLab). My goal is to bridge foundational research and scalable tools to advance the field.
Previously, I completed my Ph.D. at the Quebec Artificial Intelligence Institute (Mila) under Professor Laurent Charlin. During my doctoral studies, I collaborated with DeepMind’s Continual Learning team led by Marc’Aurelio Ranzato, Amazon’s team under Alex Smola, and ElementAI prior to its integration with ServiceNow.
My Ph.D. research focused on building agents capable of accumulating and transferring knowledge across tasks, drawing from continual learning, transfer learning, and meta-learning. My work explored applications in language, vision, and reinforcement learning, emphasizing improvements in data and compute efficiency.
About the AI Alliance
The AI Alliance is an international community of researchers, developers and organizational leaders committed to support and enhance open innovation across the AI technology landscape to accelerate progress, improve safety, security and trust in AI, and maximize benefits to people and society everywhere. Members of the AI Alliance believe that open innovation is essential to develop and achieve safe and responsible AI that benefit society rather than benefit a select few big players.Join the community
Sign up for the AI Alliance newsletter (check the website footer) and join our new AI Alliance Discord.1 attendee from this group
Past events
123