Running Machine Learning workloads on Azure Kubernetes Service (AKS)


Details
Kubernetes simplifies building and operating coordinated distributed systems. Often, this means scaling stateless web applications across nodes and cloud providers.
In this talk, we will discuss how to take advantage of the distributed primitives provided by Kubernetes to build a massively scalable online machine learning service on Azure.
Topics covered:
- Fundamental Kubernetes constructs
- Scheduling of scarce compute resources (GPUs)
- Autoscaling based on application metrics
- Implications of Kubernetes on service availability
Jordan Olshevski works on Azure Kubernetes Service at Microsoft, and has years of experience in the field of distributed systems and containers. Before coming to Microsoft, he helped build an enterprise-scale application runtime platform at Nike.
Door open at 5:30 for networking and pizza, and the presentation will begin at 6:00.

Running Machine Learning workloads on Azure Kubernetes Service (AKS)