Skip to content

PyData Cambridge - 25th Meetup

Photo of Federico
Hosted By
Federico and 3 others
PyData Cambridge - 25th Meetup

Details

We are happy to announce the 25th PyData Cambridge meetup!

IMPORTANT

This edition will be hosted online.

Agenda

19:00 - Introduction
19:10 - "Community Building through Documentation" -- Melissa Weber Mendonça (Quansight)
19:45 - "Optimising feature engineering pipelines with Feature-engine" -- Soledad Galli (Train In Data)
20:30 - End

Code of Conduct

PyData is dedicated to providing a harassment-free event experience for everyone, regardless of gender, sexual orientation, gender identity, and expression, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of participants in any form.

The PyData Code of Conduct governs this meetup. ( http://pydata.org/code-of-conduct.html ) To discuss any issues or concerns relating to the code of conduct or the behavior of anyone at a PyData meetup, please contact NumFOCUS Executive Director Leah Silen (leah@numfocus.org) or organizers.

Talks

Community Building through Documentation

Abstract:
In this talk, we'll discuss a few concepts about documentation, including reference documentation (docstrings, API documentation) and narrative/educational content. This will be based on my own experience in the NumPy Documentation Team, and the current actions the community has been taking to create an environment where those contributions are considered important and the community expands beyond code.

We'll talk about a few tools for documenting projects (Sphinx/reST, markdown, jupytext, readthedocs), their advantages and disadvantages and how to make it easier for community members to start contributing and work to create and improve open source projects.

Bio:
Melissa is an applied mathematician and former university professor turned software enginneer. Nowadays she works at Quansight, developing open source software and leading the Documentation Team for NumPy. She is also a LaTeX, Fortran and free software enthusiast.

----

Optimising feature engineering pipelines with Feature-engine

Abstract:
Feature engineering is the process of using domain knowledge of the data to transform existing features or to create new variables from existing ones, for use in machine learning.

Feature-engine is an open source Python library with the most exhaustive battery of transformers to engineer features for use in machine learning models. Feature-engine simplifies and streamlines the implementation of and end-to-end feature engineering pipeline, by allowing the selection of feature subsets within its transformers, and returning dataframes for easy data exploration. Feature-engine’s transformers preserve Scikit-learn functionality with the methods fit() and transform() to learn
parameters from and then transform data.

In this talk, I will give an overview of commonly used techniques to engineer features for machine learning and highlight how the use of open source packages like Feature-engine help us streamline our feature engineering pipelines, while improving development performance and reducing deployment time.

Bio:
Soledad Galli is a Lead Data Scientist with 10+ years of experience in world class academic institutions and renowned organisations. She has experience in finance and insurance, received a Data Science Leaders Award in 2018 and was selected “LinkedIn’s voice” in data science and analytics in 2019.

Sole researched, developed and put into production machine learning models for Insurance Claims, Credit Risk Assessment and Fraud Prevention. Sole founded Train in Data with the idea to bring practical knowledge of machine learning and AI software engineering to the community. She created online courses on these topics which have enrolled about 20k students worldwide. Sole also created Feature-engine, an open source Python package to streamline feature engineering pipelines.

Photo of PyData Cambridge group
PyData Cambridge
See more events