Skip to content

From Prompt to Production: Smarter AI with Evaluations

Photo of Desiree Lockwood
Hosted By
Desiree L.
From Prompt to Production: Smarter AI with Evaluations

Details

Join host, Angel Menendez from n8n, and special guest, Elvis Saravia, in exploring strategies for evaluating AI and how to implement them into your development process.

In this session, you'll learn how to:

  • Apply industry best practices for evaluating agentic workflows
  • Build evaluation datasets
  • Select and design evaluation metrics
  • Perform evaluations with n8n

One of the more challenging aspects of building production-ready AI solutions is designing them to handle unpredictable real-world inputs and tricky edge cases. Identifying and fixing these can take up the majority of development time, especially when a small prompt tweak might solve one issue but introduce new, unintended behavior elsewhere. And even without prompt changes, model outputs can drift over time, making it hard to know whether your system is improving or quietly degrading.

For development teams, this creates uncertainty and risk - especially when AI outputs are customer-facing or business-critical. That’s when incorporating evaluations into your AI development process can really help. Effective evaluations can enable data-driven decisions on whether to adjust a prompt or switch to a new model, and can help catch regressions by monitoring performance - giving you peace of mind for AI solutions in production.

Learn more in our blog, Introducing Evaluations for AI workflows.

Guest: Elvis Saravia is a co-founder of DAIR.AI, where he leads all AI research, education, and engineering efforts. Elvis holds a Ph.D. in computer science, specializing in NLP and language models. His primary interests are training and evaluating LLMs and developing scalable applications with LLMs. He co-created the Galactica LLM at Meta AI and supported and advised world-class teams like FAIR, PyTorch, and Papers with Code. Prior to this, he was an education architect at Elastic, where he developed technical curriculum and courses on solutions such as Elasticsearch, Kibana, and Logstash.

***

​This webinar is ideal for:

  • ​Enterprise teams looking to make data-driven decisions about models and prompt changes.
  • ​Developers and engineers seeking to iterate faster while increasing AI reliability.
  • ​n8n users at any level who are building and deploying AI solutions.

***

Format:
This session will include a shared presentation with demos that includes live chat Q&A. Live video Q&A will be available at the end of the presentation.

Photo of n8n Global Meetup group
n8n Global Meetup
See more events
n8n Global Meetup
Photo of n8n Global Meetup group
No ratings yet
Online event
Link visible for attendees
FREE