Scikit-learn Sprint (to contribute to open source)

NYC Women in Machine Learning & Data Science
NYC Women in Machine Learning & Data Science
Public group

Location visible to members


Event space sponsored by: Stack Exchange

Refreshments sponsored by: Bloomberg

Event agenda:
9:30am - 10:00am: arrive early for technical support
10:00am - 1:00pm: Sprint
1:00pm - 2:00pm: Lunch will be provided
2:00pm - 4:00pm: Sprint

“Software development as a whole is male-dominated, but the world of open-source software is even more so. Data from the Bureau of Labor Statistics from 2015 puts the percentage of computer and mathematical occupations filled by women, a group that includes Web and software development, at 24.7 percent. Of the 3 million code changes reviewed in this study, fewer than 150,000 were by women. A 2013 survey found that only 11 percent of open-source contributors were women.”
Read more in this article on Women & Coding. (

We would like to increase the number of women in open source, particularly for the Python machine learning library scikit-learn. The scikit-learn repository ( has over 700 issues open. By organizing and offering this workshop, we hope to increase women’s participation in open source as well as advance the scikit-learn library.

The plan is to work in pairs. The goal is that each participant will be able to resolve one trivial fix and one actual fix.

Please bring a laptop with Python installed, Anaconda [Anaconda includes Jupyter] and Jupyter Notebook. We will review the basics of Git at the beginning of workshop so attendees are able to submit pull requests.

1) Python
- be comfortable with Python
- familiarity with Jupyter Notebook

2) Git
- should have a GitHub account
- some familiarity with Git / GitHub
- review some Git resources prior to event
- we'll go over pull requests at beginning of event

3) Read thru "Contributing" documentation
- it is approximately 16 pages

4) Review Open Issues
go through some Issues and become familiar with them

Instructor Bio:
This event will be led by Andreas Mueller. Andreas is a Lecturer in Data Science at Columbia University. Previously he worked as a Machine Learning Scientist at Amazon, working on computer vision and forecasting problems. He is one of the core developers of the scikit-learn machine learning library, and has maintained it for several years.
His mission is to create open tools to lower the barrier of entry for machine learning applications, promote reproducible science and democratize the access to high-quality machine learning algorithms.

Andreas’ twitter: @amuellerml
Andreas’ github: @amueller