Python Data Analysis I Workshop


Data Community DC and District Data Labs are excited to be offering two Python Data Analysis workshops to kick off 2014.  

Python is probably the most popular general purpose scripting language in use today. It comes with "batteries included" and includes an ecosystem of over 38,000 packages.

Several user-contributed packages have been developed over the years to provide scientific computing capabilities in line with Matlab. These include Numpy, Scipy, Sympy, Matplotlib. They are wonderful packages. However, until the advent of pandas, it was not possible to easily ingest, transform and clean data like you could with domain specific languages like R and SAS.

Today, the scientific stack of pandas, Numpy, Scipy and Matplotlib, along with IPython's ability to "glue" different languages and allow parallel computing form a sound platform for using Python as a primary data analytics tool. Python's capabilities are quickly moving forward to include functionality comparable to specialized data processing languages. It is also attractive for integrating data analytic processing into existing Python-based web frameworks like Django and Flask, as well as other Python-based software development. According to a recent KDNuggets poll, Python is the second most commonly used computer language for data analysis.

This workshop will provide an introduction to working with Python in a data analysis context.  You will learn how to use Python and it's packages to read data from different data sources, how to munge and summarize data, and how to visualize data.  

The price per attendee for this workshop is $150. 

What to Bring: 

We will use the Python distribution Anaconda provided by Continuum Analytics. Anaconda is free to use, works on Windows, Mac OSX and Linux, and includes the data analysis stack. Anaconda installation does not require administrative privileges, so it has the lowest barrier to use among the available scientific Python distributions that include pandas, Numpy, Scipy and matplotlib. There are also several other packages for data analysis, visualization and scientific computing included as part of this distribution. Installation instructions can be found here, and all the packages we will use in this workshop are installed by default.

It is expected that you will come with a laptop with Anaconda already installed. We will provide you a link to the Github site where all code for the workshop will be available, as well as the workshop presentation, which will be provided as a IPython notebook (if you don't know what this is, you will find out at the workshop).



Python primer
- "Hello World!"
- Python as a calculator
- Object types
- List comprehensions
- Basic data manipulation

Python tools to use
- IPython
- Pandas
- Numpy

Importing data
- Text files (tab-delimited, comma-delimited)
- SQL databases
- Web pages
- Idiosyncratic data

Storing data in Python
- Pandas (Series and DataFrame)
- Numpy arrays

Cleaning data
- Missing data
- Data summaries
- Data imputation

Data manipulation and munging
- Merging datasets
- Subsetting data
- Grouping and summarizing
- Split-apply-combine
- Pivot tables

Basic graphics
- Histograms and bar plots
- 2D plots
- Visualizing bivariate patterns
- Boxplots

Abhijit Dasgupta is a data consultant working in the greater DC-Maryland-Virginia area, with several years experience in biomedical consulting, business analytics, bioinformatics, and bioengineering consulting.

He has a PhD in Biostatistics from the University of Washington and over 40 collaborative peer-reviewed manuscripts, with strong interests in bridging the statistics/machine learning divide. He is always is on the lookout for interesting and challenging projects, and is an enthusiastic speaker and discussant on new and better ways to look at and analyze data. He is a member of Data Community DC and a founding member and co-organizer of Statistical Programming DC (formerly R Users DC).

Other Info: 

District Data Labs is comprised of several Data Community DC members focused on providing data science educational offerings to help others in our community enhance and expand their existing technical and analytical skills.  

For those that are driving, the best parking option we have found in the area is the garage behind the SunTrust building on the Southeast corner of Glebe Rd. and Fairfax Dr.  

Join or login to comment.

  • Rob R.

    Very glad I took the class! Thank you!

    March 3, 2014

  • Benjamin B.

    If you guys would like to ask questions and not interrupt the whole group, you can chat with me on Google Hangouts - [masked] is my Google+ account:

    February 22, 2014

  • Patrick P.

    Running 30 min late, but will be there!

    February 22, 2014

  • Chris G.

    Would love to get in on this. I'm on the waitlist so if anyone knows they can't make it, please let me know!

    February 19, 2014

29 went

Your organizer's refund policy for Python Data Analysis I Workshop

Refunds offered if:

  • the Meetup is cancelled
  • you can cancel at least 7 day(s) before the Meetup

Payments you make go to the organizer, not to Meetup. You must make refund requests to the organizer.

People in this
Meetup are also in:

Create your own Meetup Group

Get started Learn more

I started the group because there wasn't any other type of group like this. I've met some great folks in the group who have become close friends and have also met some amazing business owners.

Bill, started New York City Gay Craft Beer Lovers

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy