Zum Inhalt springen

Details

******************************************************

Please do not forget to register, or you might not be able to join the meetup. Here is the Teams link to register for the webinar:

https://teams.microsoft.com/meet/374337781753797?p=rzsUKoyhYAeFTGPtRr

******************************************************

# Webinar Series Structure

### Part 1 — Architecture, Setup & Observability (this webinar)

How to instrument Agentic AI applications with MLflow and create the right foundations.

### Part 2 — Evaluation Frameworks for LLM Systems

How to evaluate prompts, RAG pipelines, tools, and agent performance.

### Part 3 — Prompt Management & Optimization

How to improve prompts safely, continuously, and at scale.

As Agentic AI applications become more complex — combining LLMs, tools, RAG pipelines, multi-agent workflows, and business logic — maintaining quality and reliability becomes significantly harder.
Teams quickly face critical questions:

  • Why did the agent fail on this request?
  • Which prompt version performed best?
  • How do we evaluate RAG quality consistently?
  • How can we trace tool usage and reasoning paths?
  • How do we move from experimentation to reliable SDLC operations?

Building Agentic AI systems is no longer just about creating good prompts — it requires proper observability, evaluation frameworks, prompt lifecycle management, and production-grade governance.
This is where MLflow for GenAI and Agentic AI systems becomes a game changer.
In this 60-minute interactive webinar, we’ll walk through how to use MLflow to structure prompt management, evaluation pipelines, tracing, and observability for Agentic AI applications built with LLMs.
This session is the first webinar of a 3-part series, focused on the architecture and setup foundations required before evaluation and optimization can happen effectively.
We will cover how to instrument applications built with LangChain, LangGraph, OpenAI, and RAG pipelines, and how MLflow helps teams improve quality while making SDLC operations significantly smoother.

***

# You’ll learn

### 🧠 Why Agentic AI Needs More Than Prompt Engineering

Understand why production-ready AI systems require:

  • Prompt versioning and lifecycle management
  • Evaluation frameworks for LLM outputs
  • Observability across tools and reasoning chains
  • Governance and reproducibility for production systems
  • Faster debugging and safer deployments

***

### 🏗️ Setting Up MLflow for Agentic AI Applications

Learn how to structure MLflow for:

  • Prompt tracking and version control
  • Experiment management for LLM workflows
  • Managing prompt variants across environments
  • Integration with LangChain and LangGraph pipelines
  • Supporting collaborative AI development across teams

***

### 🔍 Tracing & Observability for LLM Systems

How to trace complex agentic workflows:

  • Capturing prompts, responses, and tool calls
  • Understanding reasoning paths across agents
  • Tracking failures and hallucination sources
  • Debugging RAG retrieval issues
  • Monitoring performance across production workloads

Tracing becomes the foundation for reliable AI engineering.

***

### 📊 Evaluation Foundations for RAG, Prompts & Tools

Introduction to what will be covered deeper in Part 2:

  • Evaluating prompt quality
  • Measuring RAG retrieval effectiveness
  • Assessing tool-calling reliability
  • LLM-as-a-Judge approaches
  • Human evaluation vs automated evaluation

We’ll focus on the architecture needed to support these evaluations.

***

### ⚙️ Prompt Management & Optimization Foundations

Introduction to what will be covered deeper in Part 3:

  • Prompt registries and governance
  • A/B testing prompt variants
  • Continuous prompt optimization workflows
  • Regression testing for prompts
  • Production-safe rollout strategies

***

### 💻 Live Demo: Instrumenting an Agentic AI App with MLflow

Step-by-step implementation:

  • Connecting MLflow to LangChain and LangGraph
  • Logging prompts, traces, and tool execution
  • Visualizing runs and debugging agent behavior
  • Structuring prompt experiments
  • Preparing the system for evaluation and optimization

***

📅 Duration: 60 minutes
🧠 Level: Intermediate to Advanced
🛠️ Tech Stack: MLflow, LangChain, LangGraph, OpenAI, RAG Pipelines

Verwandte Themen

Artificial Intelligence
Big Data
Data Science
Python
Software Development

Das könnte dir auch gefallen