Traffic Management using Gateway API: From Core Routing to AI Inference


Details
Gateway API is an official Kubernetes project offering a role-oriented successor to traditional Ingress (L4 and L7) and a new foundation for service mesh deployments (the GAMMA Initiative). This talk will dive into its generic and expressive design, demonstrating how it can help simplify complex routing challenges.We'll then explore the evolving Inference Extension, an alpha project bringing specialized capabilities for serving generative AI models on Kubernetes. Discover how this extension enables advanced features like model-aware routing, priority queuing/shedding, and other features designed to optimize your inference workloads for performance and cost. Join us to understand how these cutting-edge APIs are helping shape the future of Kubernetes and AI infrastructure.
Bio:
Greg Bray is a Customer Engineer at Google Cloud, specializing in designing GKE, Service Mesh, and Serverless deployments. Previously Greg worked as an SRE at Reddit, Walmart Labs, and Stack Overflow.
In person at Weave and online Link: https://meet.google.com/tto-pnic-qau
Forge Slack Link: http://bit.ly/forgeut

Traffic Management using Gateway API: From Core Routing to AI Inference