Overview of Hosting LLMs and RAG Engine on Azure Kubernetes with KAITO
Dettagli
KAITO RAG Engine on Azure Kubernetes Service is a Kubernetes-native way to run a full Retrieval-Augmented Generation (RAG) backend fully hosted inside your own Azure environment.
Application Demo against LLMs and a RAG system on Azure Kubernetes Service (AKS) using the KAITO RAG Engine - https://github.com/kaito-project/kaito
You’ll see how to:
- High level architecture and configuration with KAITO on AKS
- Understand the RAG architecture (ingestion, embedding, retrieval, inference)
- Ingest and index real documents into the RAG Engine
- Demo Chatbot UI against LLM and RAG Engine
The session is practical, code-driven, and demo-heavy, with a focus on how these components fit together in a cloud-native, scalable architecture.
Whether you’re a cloud engineer, platform engineer, or AI developer, you’ll walk away with a clear mental model—and a working reference of a RAG system on Kubernetes.
Bio:
Roy Kim is a Architect and engineer focused on Azure, AI and Microsoft 365. In recent years, has been supporting azure and AI solutions in Azure for large enterprise. A Microsoft MVP award recipient since 2017. He holds a BS in computer science degree from the University of Toronto. See his blog at roykim.ca and https://youtube.com/RoyKimYYZ
