Past Meetup

Deploying Interactive Machine-Learning Applications with Clipper

This Meetup is past

12 people went

Clara Labs

Sacramento and Montgomery · San Francisco, CA

How to find us

Deploying Interactive Machine-Learning Applications with Clipper

Location image of event venue


Join Clara Labs for an engaging discussion on a continuous integration and deploy for machine learning pipelines, led by Dan Crankshaw of the RISELab at UC Berkeley. If you have interest in machine learning in practical systems, this talk may interest you. Drinks and pizza will be provided.

Deploying Interactive Machine-Learning Applications with Clipper


Machine learning is being deployed in a growing number of applications which demand real-time, accurate, and robust predictions under heavy serving loads. However, most machine learning frameworks and systems only address model training and not deployment.

Clipper is an open-source, general-purpose model-serving system that addresses these challenges. Interposing between applications that consume predictions and the machine-learning models that produce predictions, Clipper simplifies the model deployment process by adopting a modular serving architecture and isolating models in their own containers, allowing them to be evaluated using the same runtimeenvironment as that used during training. Clipper's modular architecture provides simple mechanisms for scaling out models to meet increased throughput demands and performing fine-grained physical resource allocation for each model. Further, by abstracting models behind a uniform serving interface, Clipper allows developers to compose many machine-learning models within a single application to support increasingly common techniques such as ensemble methods, multi-armed bandit algorithms, and prediction cascades.

In this talk I will provide an overview of the Clipper serving system and discuss how to get started using Clipper to serve Apache Spark and TensorFlow models on Kubernetes. I will then discuss some recent work on end-to-end cost-aware resource allocation and scheduling for multi-model applications.


Dan Crankshaw is a PhD student in the UC Berkeley CS department working in the RISELab. After cutting his teeth doing large-scale data analysis on cosmology simulation data and building systems for distributed graph analysis, he turned his attention to machine learning systems. His current research interests include systems and techniques for serving and deploying machine learning, with a particular emphasis on low-latency and interactive applications.