Zero to GPU: Building a Prod-Grade ML Infra Platform on EKS from Scratch
Details
We are excited to announce the May meetup scheduled for May 21st.
Zero to GPU: Building a Production-Grade ML Infrastructure Platform on EKS from Scratch
Most GPU clusters are over-provisioned, under-secured, and invisible. In this talk, I walk through building a production-grade ML infrastructure platform on Amazon EKS from scratch — covering every layer from VPC design to live GPU metrics in Grafana.
I'll show how Karpenter eliminates idle GPU costs by provisioning nodes automatically in 90 seconds and terminating them 30 seconds after jobs complete — reducing GPU spend by 89% for the same workload. I'll demonstrate IRSA-scoped IAM that enforces least-privilege access at the pod level, proven live with an AccessDenied demo. And I'll show a DCGM-powered observability stack that takes GPU utilisation from 17% to 95% with full metrics visibility.
Every component is infrastructure-as-code in Terraform. Every demo runs live. Real cluster, real numbers, real failures — and what I learned from each one.
Speaker Bio: Damian Igbe, PhD
Damian builds the infrastructure that makes AI work in production — not just in notebooks. With a PhD in Computer Science, Kubestronaut certification, and 20+ years of hands-on systems experience, Damian operates at the intersection of AI platform engineering, GPU orchestration, and cloud-native security. He has delivered mission-critical infrastructure for clients, including Pentagon staff, where reliability and security aren't aspirational — they're mandatory. His core focus is the layer most AI teams underestimate: the infrastructure underneath the model. Getting LLMs to production requires GPU-aware Kubernetes scheduling, high-throughput inference pipelines, zero-trust security, and the operational discipline to keep it all running at scale.
6:30 - 6:45 - Social
6:45 - 6:55 - Club Business
6:55 - 8:30 - Zero to GPU: Building a Production-Grade ML Infrastructure Platform on EKS from Scratch
8.30 - 8.35 - Social/Wrap-up
