Python Data Analysis I Workshop

Name: Python Data Analysis I Workshop
Start: 2014-02-22T09:00:00-05:00
End: 2014-02-22T13:00:00-05:00
Location: Metro Offices - Ballston Office Center

Hosted By

Abhijit

Details

Overview:

Data Community DC (http://datacommunitydc.org/) and District Data Labs (http://www.districtdatalabs.com/) are excited to be offering two Python Data Analysis workshops to kick off 2014.

Python is probably the most popular general purpose scripting language in use today. It comes with "batteries included" and includes an ecosystem of over 38,000 packages.

Several user-contributed packages have been developed over the years to provide scientific computing capabilities in line with Matlab. These include Numpy, Scipy, Sympy, Matplotlib. They are wonderful packages. However, until the advent of pandas, it was not possible to easily ingest, transform and clean data like you could with domain specific languages like R and SAS.

Today, the scientific stack of pandas, Numpy, Scipy and Matplotlib, along with IPython's ability to "glue" different languages and allow parallel computing form a sound platform for using Python as a primary data analytics tool. Python's capabilities are quickly moving forward to include functionality comparable to specialized data processing languages. It is also attractive for integrating data analytic processing into existing Python-based web frameworks like Django and Flask, as well as other Python-based software development. According to a recent KDNuggets poll (http://www.kdnuggets.com/polls/2013/languages-analytics-data-mining-data-science.html), Python is the second most commonly used computer language for data analysis.

This workshop will provide an introduction to working with Python in a data analysis context. You will learn how to use Python and it's packages to read data from different data sources, how to munge and summarize data, and how to visualize data.

The price per attendee for this workshop is $150.

What to Bring:

We will use the Python distribution Anaconda (https://store.continuum.io/cshop/anaconda/) provided by Continuum Analytics. Anaconda is free to use, works on Windows, Mac OSX and Linux, and includes the data analysis stack. Anaconda installation does not require administrative privileges, so it has the lowest barrier to use among the available scientific Python distributions that include pandas, Numpy, Scipy and matplotlib. There are also several other packages for data analysis, visualization and scientific computing included as part of this distribution. Installation instructions can be found here (http://docs.continuum.io/anaconda/install.html), and all the packages we will use in this workshop are installed by default.

It is expected that you will come with a laptop with Anaconda already installed. We will provide you a link to the Github site where all code for the workshop will be available, as well as the workshop presentation, which will be provided as a IPython notebook (if you don't know what this is, you will find out at the workshop).

Outline:

Python primer

"Hello World!"
Python as a calculator
Object types
List comprehensions
Basic data manipulation

Python tools to use

IPython
Pandas
Numpy

Importing data

Text files (tab-delimited, comma-delimited)
SQL databases
Web pages
Idiosyncratic data

Storing data in Python

Pandas (Series and DataFrame)
Numpy arrays

Cleaning data

Missing data
Data summaries
Data imputation

Data manipulation and munging

Merging datasets
Subsetting data
Grouping and summarizing
Split-apply-combine
Pivot tables

Basic graphics

Histograms and bar plots
2D plots
Visualizing bivariate patterns
Boxplots

Instructor:
Abhijit Dasgupta is a data consultant working in the greater DC-Maryland-Virginia area, with several years experience in biomedical consulting, business analytics, bioinformatics, and bioengineering consulting.

He has a PhD in Biostatistics from the University of Washington and over 40 collaborative peer-reviewed manuscripts, with strong interests in bridging the statistics/machine learning divide. He is always is on the lookout for interesting and challenging projects, and is an enthusiastic speaker and discussant on new and better ways to look at and analyze data. He is a member of Data Community DC and a founding member and co-organizer of Statistical Programming DC (formerly R Users DC).

Other Info:

District Data Labs (http://www.districtdatalabs.com/) is comprised of several Data Community DC members focused on providing data science educational offerings to help others in our community enhance and expand their existing technical and analytical skills.

For those that are driving, the best parking option we have found in the area is the garage behind the SunTrust building on the Southeast corner of Glebe Rd. and Fairfax Dr.

Events in Arlington, VA

Data Community DC (DC2)

See more events

Data Community DC (DC2)

public group

Saturday, February 22, 2014
9:00 AM to 1:00 PM EST

Metro Offices - Ballston Office Center

4601 N Fairfax Drive · Arlington, VA

Data Community DC (DC2)

public group

Python Data Analysis I Workshop