6:30 PM - 7:00 PM - Food, Networking, and a word from our sponsors (SunGard and AWeber)
7:00 PM - 7:30 PM - scikit-learn, by Michael Becker
7:30 PM - 8:00 PM - NLTK, by Chris Brown
8:00 PM - 8:30 PM - Lightning Talks
8:30 PM - Leave for Prohibition Taproom
scikit-learn, by Michael Becker
In this talk Michael will lead us through section 2 of the scikit-learn tutorial: http://scikit-learn.github.com/scikit-learn-tutorial Topics covered will include: feature extraction, classification, regression, principal component analysis (PCA), clustering, detecting and avoiding overfitting, and measuring performance. During this tutorial we'll be making use of the iris dataset (https://en.wikipedia.org/wiki/Iris_flower_data_set) in most of the examples. If participants wish to follow along with the tutorial, they should follow the steps listed at http://scikit-learn.github.com/scikit-learn-tutorial/setup.html before coming to the talk.
Michael Becker is the Senior Data Engineer at AWeber and founder of the DataPhilly Meetup group. On a day to day basis, he spends a majority of his time acquiring, scrubbing, exploring, and visualizing
data. He loves machine learning and gets his kicks out of clustering, regression and classification algorithms.
NLTK (Natural Language Toolkit), by Chris Brown
This talk will be modeled after the NLTK talk given at PyData NYC 2012 http://vimeopro.com/continuumanalytics/pydata-nyc-2012/video/53062324. The talk covers out-of-the-box features available in NLTK which include stemming, tokenization, stripping html, wordnet integration, named entity recognition, quickly building a corpus of data, and basic classifiers. Chris will motivate the use of these tools with a practical tutorial on building a topic classifier for news articles and then classifying political news according to political sentiment (conservative or liberal).
Chris Brown is a PhD candidate in political science at the University of Pennsylvania. He has worked on research projects that include determining how personal characteristics of leaders increase the likelihood of international conflict, measuring political polarization across different issue areas in Congress using roll call votes, and assigning political partisanship scores to blogs using natural language processing. He uses Python almost every day while working on his dissertation which uses natural language processing to test theories about political parties and polarization with Congressional speeches. In his free time Chris has been developing StateRep.me a Django-based website to help Pennsylvanians monitor and track their state legislators.