Production GenAI with RAG on Multi-cluster Cloud Kubernetes


Details
Co-hosted by Kobie Crawford at MosaicML/Databricks.
GenAI models with RAG have demonstrated high-quality results for a variety of use cases. Companies putting such models into production are finding that self-hosting them has control, privacy, performance, and cost advantages, but that it requires effective infrastructure management. Kubernetes clusters support orchestration and management across cloud-based computing resources and can provide a flexible platform for hosting production GenAI models with RAG. Join us to learn how to use a resource-aware policy-based approach for multiple cloud K8s clusters to handle production hosting of a set of GenAI models with RAG.
5:30-6:00pm: Mingle, food, drinks
6:00-6:10pm: Welcome message from Madhuri and Kobie Crawford (MosaicML/Databricks)
6:10-6:40pm: Resource-Aware Scheduling for Production GenAI with RAG running on Multi-cluster Cloud Kubernetes - Anne Holler (Elotl), David Southwell (DataStax)
6:40-7:10pm: Scaling to the Future: What it Really takes to Train Your Own LLM from Scratch - Ajay Saini (MosaicML/Databricks)
7:10-7:30pm: LLM evals and Cloud Native AI - Ricardo Aravena (TruLens)
___________
Zoom link: https://databricks.zoom.us/j/83436194536?pwd=bFFvV1R5b1BUR0IvQjUrekcybVBBQT09

Production GenAI with RAG on Multi-cluster Cloud Kubernetes