Python has been a great platform for the munging and analysis of numeric data (numpy, scipy) as well as textual data (NLTK). However, for general data analyses, there was something to be desired. The advent of pandas has enabled a much better and easier importing and handling of data, specially heterogenous data types, as well as made the munging of data faster and more efficient.
Data modeling in Python is still not as comprehensive as in other software ecosystems, but great strides have been made towards a very good ecosystem. I'll survey two of the most useful: scikit-learn and statsmodels.
Abhijit Dasgupta is a data scientist and biostatistician in the DC metro area. He is a high-level consultant for NIH with several years experience in bioinformatics as well as over 40 peer reviewed articles. He also works with multiple local companies on their data science needs. He organizes Statistical Programming DC, a meetup dedicated to statistical programming issues in R, Python and other platforms, and sits on the board of Data Community DC, a local non-profit dedicated to creating and enhancing links between data-oriented individuals, groups and companies in the greater DC area.
(This talk was postponed from the October DCPython meeting.)