GPU Sharing for AI at Enterprise Scale


Details
Abstract
In this talk, we show you how we optimize GPU usage at scale with Hopsworks on Kubernetes. A single Hopsworks can contain 1000s of CPUs and GPUs, and Hopsworks builds on Kueue to enable fair sharing of resources for training (Ray, PyTorch, etc), inference (KServe/vLLM), and scalable compute workloads (Spark, Flink, etc) across teams and projects. We show we simplify Kueue's abstractions (cohorts, global queues, local queues) using Hopsworks' project-based multi-tenancy security model.
About the Speaker
Jim Dowling is CEO of Hopsworks and former Associate Professor at KTH Royal Institute of Technology. He is lead architect of the open-source Hopsworks Feature Store platform. He is currently writing a book for O'Reilly on "Building ML Systems: batch, real-time, and LLMs"

GPU Sharing for AI at Enterprise Scale