Enterprise Jupyter Deployment and Airflow In Production


Details
Talk 1: The Five Stages of Enterprise Jupyter Deployment
Jupyter notebooks are an important tool for data science. For a single user on a laptop, these notebooks are a simple, straightforward tool. But Jupyter in the enterprise is a much more complex affair. Enterprises have large teams of data scientists who need to run their notebooks atop scalable compute infrastructure with secure, audited access to massive, proprietary data sets; all while keeping hardware costs down.
Here at IBM’s Center for Open-Source Data and AI Technologies, we’ve seen multiple enterprise rollouts of Jupyter notebooks, both first-hand, in IBM products and services; and second-hand, in our discussions with other members of the Jupyter community.
In this talk, we merge together the stories of these projects and walk through the process of deploying high-performance, secure, mulitentant Jupyter notebooks in an enterprise setting. Our goal is here is inform others who may be at the beginning of this journey of what is coming and how to navigate the challenges ahead. Along the way, we answer five important questions: What are Jupyter notebooks? What makes Jupyter so attractive to data scientists? Why is deploying Jupyter in the enterprise difficult? What are your deployment options today? And, what are the tradeoffs of those approaches? We’ll finish with a description of how how IBM and other members of the Jupyter community are working towards reducing those tradeoffs with the Jupyter Enterprise Gateway project. Finally, we’ll give a demonstration of multitenant Jupyter notebooks in action. This talk is aimed at enterprise architects who need to support growing data science teams with multi-user deployments of Jupyter. No knowledge of data science is required.
Speakr: Fred Reiss
Fred Reiss is the Chief Architect at IBM's Center for Open-Source Data and AI Technologies in San Francisco. Fred received his Ph.D. from UC Berkeley in 2006, then worked for IBM Research Almaden for the next nine years. At Almaden, Fred worked on the SystemML and SystemT projects, as well as on the research prototype of DB2 with BLU Acceleration. Fred has over 25 peer-reviewed publications and six patents.
Talk 2: Airflow - 2 Years in Production
We will introduce Airflow, an Apache Project for scheduling and workflow orchestration. We will discuss use cases, applicability and how best to use Airflow, mainly in the context of building data engineering pipelines. We have been running Airflow in production for about 2 years, we will also go over some learnings, best practices and some tools we have built around it.
Spearks: Robert Sanders, Shekhar Vemuri
Robert Sanders is a Big Data Manager, Engineer, and Architect at Clairvoyant. He primarily works with clients to build out Big Data solutions on the Hadoop Ecosystem. Robert has deep background in enterprise systems, working on full-stack implementations and then focusing on Data management platforms.
Shekhar Vemuri is CTO at Clairvoyant. Shekhar works with clients across various industries and helps define data strategy, and lead the implementation of data engineering and data science efforts.
Agenda :
6pm -- 6:30 pm Check-in/networking
6:30 pm -- 6:40 pm Introduction
6:40 pm -- 7:40 pm Talk 1 + QA ( Fred Reiss)
6:40 pm -- 7:40 pm Talk 2 + QA (Robert Sanders, Shekhar)
7:40 pm ---8:10 pm Closing
8:30pm --- office closed.

Enterprise Jupyter Deployment and Airflow In Production