Skip to content

PyData Montreal #16: online event with James Lamb and Arnab Biswas

Photo of Maria Khalusova
Hosted By
Maria K. and Alex K.
PyData Montreal #16: online event with James Lamb and Arnab Biswas

Details

Agenda:
(All times in EST)
6:00 pm — Introductions
6:10 pm — "Scaling Machine Learning with Python and Dask" by James Lamb
6:50 pm — Q&A

7:05 pm — 5 min break

7:10 pm — "Automated Feature Engineering on large scale Time Series data using tsfresh & Dask" by Arnab Biswas
7:50 pm — Q&A
8:00 pm — Wrap-up

----------------------------------------------------------------------------
"Scaling Machine Learning with Python and Dask"

Abstract:
In this talk, attendees will get an introduction to Dask, a distributed computing framework in the PyData ecosystem. The first half of the talk will describe the current state of the project and its ecosystem including distributed data collections, cloud deployment options, distributed machine learning projects, and workflow orchestration. The second half of the talk will be a live demo showing the programming model for machine learning on Dask, with specific examples showing how to do distributed LightGBM training with Dask.

About James Lamb:
James Lamb is an engineer at Saturn Cloud, where he works on a team building a managed Dask + Kubernetes product. He is a maintainer on LightGBM, and has made many contributions to other open source data science projects, including Prefect. He holds masters degrees in Applied Economics (2014) and Data Science (2018). Before joining Saturn, he worked as an IoT Data Scientist at Amazon Web Services and Uptake.

----------------------------------------------------------------------------
"Automated Feature Engineering on large scale Time Series data using tsfresh & Dask"

Abstract:
The internet of things, digitized health care systems, financial markets, smart cities (etc.) are continuously generating time series data of different types, sizes and complexities. Time series data is different from non-temporal data. In time series data, observation at any instance of time depends on the observations from the past based on the underlying process. Often it contains noise and redundant information. To make things more complex, most of the traditional Machine Learning algorithms are developed for non-temporal data. Thus, extracting meaningful features from raw time series plays a major role. While there are features generic across different flavors/types of time series, there are features specific to different domains. As a result, feature engineering often demands familiarity with domain specific and/or signal processing algorithms making the process complicated.

First half of the presentation will talk about a Python library called tsfresh. tsfresh accelerates the feature engineering process by automatically generating hundreds of features for time series data. The second half of the presentation will describe various challenges encountered when the size of the data is large and how these challenges can be addressed using tsfresh on top of a parallel computing framework, Dask.

About Arnab Biswas:
Arnab Biswas is a Data Scientist working in EcoEnergy, Carrier Global Corporation. Around 15 years back, he started his career as a software developer. Over years, he has worked in various organizations in the Telecom & Networking domain, e.g., Cisco Systems, Nokia Siemens Networks etc. His current area of focus is Machine Learning for Predictive Maintenance in HVAC Industry. He has worked as well as volunteered for different non profit organizations in India helping them address their Data Science related needs.

Photo of PyData Montreal group
PyData Montreal
See more events