PyData Montreal #19


Details
Agenda:
(All times in EST)
6:00 pm — Introductions
6:10 pm — "Text Extensions for Pandas" by Frederick Reiss
7:10 pm — "Introducing Elyra: Extending JupyterLab for AI" by Luciano Resende
8:00 pm — Wrap-up
----------------------------------------------------------------------------------
"Text Extensions for Pandas"
Abstract:
Most areas of Python data science have standardized Pandas DataFrames for representing and manipulating structured data in memory.
Natural Language Processing, not so much.
In this presentation, we'll explain why you should be using Pandas for NLP. DataFrames make every phase of NLP easier, from creating new models, to evaluating their effectiveness, to building applications that integrate those models. We'll talk about our open source library, Text Extensions for Pandas (https://ibm.biz/text-extensions-for-pandas), which adds special data types and library integrations specifically geared to NLP use cases. We'll explain how these extensions connect to some basic NLP concepts, and then we'll finish with an example of using Pandas to build an NLP application.
About Fred :
Fred Reiss is a Principal Research Staff Member at IBM Research and Chief Architect at IBM's Center for Open-Source Data and AI Technologies (CODAIT). He is also one of the authors of the Text Extensions for Pandas library. Fred received his Ph.D. from U.C. Berkeley in 2006 and immediately IBM Research, joining the CODAIT center in 2015. Fred has written multiple peer-reviewed papers in the areas of natural language processing, database systems, and machine learning.
"Introducing Elyra: Extending JupyterLab for AI"
Abstract:
Creating an AI pipeline often involves learning/writing another layer of code to orchestrate the flow of information. Managing environments, artifact handling and system resources can feel daunting for those unfamiliar with the infrastructure side of AI. Elyra's pipeline editor abstracts patterns in workflow development to provide a friendly and familiar interface in JupyterLab in a NoCode/LowCode fashion and integrates with workflow orchestrators like Kubeflow Pipelines and Apache Airflow.
This presentation will detail how Elyra, an Open Source project, creates AI pipelines and executes them locally or in external runtimes such as Kubeflow Pipelines and Apache Airflow, all without having to leave your JupyterLab development environment. We will also look at other useful Elyra functionality that helps data scientists overcome the day-to-day model development complexities, all these using live demos throughout the presentation.
About Luciano:
Luciano Resende is an Open Source AI Platform Architect with IBM's CODAIT group. He's a highly performant technical leader that embraces challenges and complex problems to drive breakthrough innovations. Luciano's expertise is in open source, and enterprise-grade AI platform technologies with about 20 years of experience successfully designing, building and delivering complex software in fortune 500 companies and open source. He has a strong background in open source big data platforms such as Apache Spark, and data science building blocks such as the Jupyter Notebook Stack and Apache Toree Scala kernel.

Sponsors
PyData Montreal #19