Skip to content

From Excel to Python via Pandas: Data acquisition, munging and visualization

From Excel to Python via Pandas: Data acquisition, munging and visualization

Details

What You'll Learn

Microsoft Excel is perhaps the world’s most popular “database” and “analytics platform”, based on its ubiquitous presence on almost all personal computers. It can be quite useful for quick analyses, but as a platform for regular and repeated analytics it can be dangerous! A programmatic approach while preserving the original data intact helps make data manipulations safe, reproducible and easy to fix and modify.

In this talk Abhijit will discuss how to use the popular language Python, and in particular the pandas package for data analysis, to perform several tasks that Excel users commonly perform. These will include basic data transformation and data munging, summaries, pivot tables, merging data sets and lookups, as well as beginning and intermediate graphics to understand your data. He’ll also talk about handling missing data and imputation.

Why Python? It is free, relatively easy to learn, has good documentation, is able to handle large datasets and can be integrated into larger software infrastructure. The PyData stack (Numpy, Scipy, pandas, Matplotlib, IPython, Sympy) provides a rather comprehensive ecosystem for scientific and data computing and manipulation, and leads to other modeling tools like statsmodels and scikits-learn.

In this talk Abhijit will be using the Anaconda distribution of Python (https://store.continuum.io/cshop/anaconda/ (https://store.continuum.io/cshop/anaconda/%29)) since it comes with all the tools and packages needed for this presentation.

If there are particular topics related to data munging and manipulation you’d like Abhijit to address, please add to the comments before March 29. No promises, but he'll try to at least comment on the topics.

Our Speaker

Dr. Abhijit Dasgupta is a consulting statistician/data scientist in the Washington DC metropolis. He is fascinated by the innovative use of data analytic methods to gain insight and tell substantive stories, using modern methods for modeling, simulation and visualization.

A PhD-level biostatistician by training, Dr. Dasgupta currently works to bridge the statistics-machine learning divide, by merging machine learning methods with sound statistical thinking and principles to help enhance our ability to derive intelligence from data. He now has over 45 peer-reviewed research and collaborative papers in areas ranging from bioinformatics, cancer research and operations research to methods for cluster analysis and intelligent robust modeling. He also consults for local startups doing bioengineering, business analytics and bioinformatics. Dr. Dasgupta also helps train companies in using R for their analytics needs, leveraging over 20 years experience using R and Python for data analyses and reporting. He is co-author of the book "Practical Data Science Cookbook".

Dr. Dasgupta also works to build the local data community, organizing the Statistical Programming DC meet up and serving on the Board of Data Community DC.

Agenda

6:30 - Food and beverages

7:00 - Intro and announcements

7:15 - Talk

8:30 - Head to Tonic for drinks

Photo of Data Engineers DC group
Data Engineers DC
See more events
GWU, Funger Hall, Room 108
2201 G St. NW · Washington, DC