Presentation/Discussion: Maximal Information Coefficient


Details
For the January Meetup, we're pleased to have Sean Murphy lead a discussion of this new paper:
Detecting Novel Associations in Large Data Sets, David N. Reshef et al., Science 16 December 2011: 1518-1524. (http://www.sciencemag.org/content/334/6062/1518.full)
Here's a paragraph of the Perspective (http://www.sciencemag.org/content/334/6062/1502.summary):
Most scientists will be familiar with the use of Pearson's correlation coefficient r to measure the strength of association between a pair of variables: for example, between the height of a child and the average height of their parents (r ≈ 0.5; see the figure, panel A), or between wheat yield and annual rainfall (r ≈ 0.75, panel B). However, Pearson's r captures only linear association, and its usefulness is greatly reduced when associations are nonlinear. What has long been needed is a measure that quantifies associations between variables generally, one that reduces to Pearson's in the linear case, but that behaves as we'd like in the nonlinear case. On page 1518 of this issue, Reshef et al. (1) introduce the maximal information coefficient, or MIC, that can be used to determine nonlinear correlations in data sets equitably.
After presenting a short overview of the paper and why it is important to statistical practitioners, Sean will lead a discussion of MIC. We would urge everyone to at least skim the paper before attending. (If you don't have non-paid access to Science, let us know and we'll hook you up...)
As always, mingling, food & drink start at 6:30pm, the presentation starts at 7pm, and we try to find a bar for Data Drinks around 8:30pm!
Sean Murphy is an entrepreneur, educator, Senior Research Scientist at Johns Hopkins and Director of Research at Manhattan GMAT. His research has spanned anomaly detection in time series data, agent-based models for disease simulation, and prediction techniques for viral evolution. In his spare time, he analyzes large data sets to design, build and deliver better educational web applications.

Presentation/Discussion: Maximal Information Coefficient