Data Scientist Workshop Using Python
This afternoon of talks will cover some basic libraries for Python data science. The session is a part of the data science weekend that includes R-Bootcamps and the theoretical and R-oriented machine learning session on Sunday morning. The Python topics covered here are stand-alone modules. This content has been chosen to both complement and extend upon the prior sessions.
Prerequisites: Ideal would be basic python knowledge, although if you're just interested in seeing what Python can do for you, it's a pretty easy language to read. If you want to install and try to follow along, the simplest method is to install the Enthought free distribution, which will solve many installation problems you might have with the necessary libraries:
You will then want to install pandas, scikit-learn, statsmodels, and patsy (links below). If you want to go-it-alone, you should minimally get Python 2.7 and then install numpy, IPython, matplotlib, scipy, and then the other libraries.
[masked] Intro to IPython Notebook & Pandas (Lynn Cherny)
The IPython notebook is a browser-based workspace for exploring and recording your Python actions. Because the notebook is an increasingly popular way to share python code demos, Lynn will walk you thru the basics (including how to view one is someone sends you one), and then move on to an introduction and demo of the Pandas library (http://pandas.pydata.org/). Pandas (built on numpy) provides a convenient data frame-like environment for manipulating data in Python, making the transition for R-users even easier. The popular new book by Wes McKinney, Python for Data Analysis, uses Pandas as the primary tool example.
[masked] Intro to Statsmodels and Patsy (Thomas Wiecki)
Statsmodels (http://statsmodels.sourceforge.net/) and Patsy (http://patsy.readthedocs.org/en/latest/overview.html) allow an R-style description of models in Python, and support a growing number of basic statistical models, from glms to time series and discrete choice methods. Thomas will illustrate the basics of these tools and their capabilities.
3.30 - 4.30 Intro to Scikit-Learn (Ryan Feather)
The workhorse tool for machine learning in Python is scikit-learn (http://scikit-learn.org/stable), an actively developed and rich collection of machine learning algorithms (with excellent documentation!). Ryan will illustrate the basics of the scikit-learn interface to these algorithms, and show you applications of supervised and unsupervised techniques, including Random Forests.