Skip to content

GPU Sharing for AI at Enterprise Scale

Photo of Doron Chen
Hosted By
Doron C.
GPU Sharing for AI at Enterprise Scale

Details

Abstract
In this talk, we show you how we optimize GPU usage at scale with Hopsworks on Kubernetes. A single Hopsworks can contain 1000s of CPUs and GPUs, and Hopsworks builds on Kueue to enable fair sharing of resources for training (Ray, PyTorch, etc), inference (KServe/vLLM), and scalable compute workloads (Spark, Flink, etc) across teams and projects. We show we simplify Kueue's abstractions (cohorts, global queues, local queues) using Hopsworks' project-based multi-tenancy security model.

About the Speaker
Jim Dowling is CEO of Hopsworks and former Associate Professor at KTH Royal Institute of Technology. He is lead architect of the open-source Hopsworks Feature Store platform. He is currently writing a book for O'Reilly on "Building ML Systems: batch, real-time, and LLMs"

Photo of Cloud Technology in the North group
Cloud Technology in the North
See more events
FREE