Building End-to-End Data Science Platforms


Learn about the latest news on JupyterLab, TensorFlow, Spark, PipelineAI, DC/OS, and Machine Learning.


Talk 0: Meetup Announcements and Updates

Talk 1: Building End-to-End Data Science Platforms (Chris Fregly, Founder @ PipelineAI & Jörg Schad @ Mesosphere)

Ever wonder about how to set up a complete end-to-end data science pipeline starting with interactive notebooks, continuous model training, continuous model optimizations, and lightning-fast model experiments directly in production?
In this talk we will focus on two aspects in particular:

Data Scientist Interaction
As a Data Scientist I really just want to focus on building great models, without having to worry too much about Cluster Infrastructure, deploying up distributed Spark/TensorFlow, or accessing remote datasets.
For this purpose, we will look at the challenge involved, and how the DC/OS JupyterLab service helps to solve these challenges. We will also explore how a Data Scientist can interact with HDFS, Spark, Cassandra from his notebook without even having to be aware of the underlying Cluster/Infrastructure.

Continuous Model Training and Serving

Traditional machine learning pipelines end with lifeless models sitting on disk in the research lab. These traditional models are typically trained on stale, offline, historical batch data.
Static models and stale data are not sufficient to power today's modern, AI-first Enterprises that require continuous model training, continuous model optimizations, and lightning-fast model experiments directly in production.
Through a series of open source, hands-on demos and exercises, we will use PipelineAI helps to solve these challenges.


Chris Fregly is Founder at PipelineAI, a Real-Time Machine Learning and Artificial Intelligence Startup based in San Francisco.
He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O'Reilly Training and Video Series titled, "High Performance TensorFlow in Production with Kubernetes and GPUs."
Previously, Chris was a Distributed Systems Engineer at Netflix, a Data Solutions Engineer at Databricks, and a Founding Member and Principal Engineer at the IBM Spark Technology Center in San Francisco.

Jörg is the technical lead for Data Science at Mesosphere in San Francisco. In his previous life he implemented distributed and in memory databases and conducted research in the Hadoop and Cloud area during his PhD. His speaking experience includes various Meetups, international conferences, and lecture halls.