Past Meetup

The medcouple — a robust measure of skewness

This Meetup is past

24 people went

Location image of event venue

Details

Jordi Gutiérrez Hermoso has kindly offered to present Brys, Hubert, and Struyf's A robust measure of skewness (https://wis.kuleuven.be/stat/robust/papers/2004/medcouple.pdf), also known as the medcouple (https://en.wikipedia.org/wiki/Medcouple), with a focus on the algorithm for its computation.

If you can, it's worth tracking down the cited article by Johnson and Mizoguchi that describes a selection algorithm used in the medcouple algorithm. Unfortunately it does not appear to be freely available on the web. However, the wikipedia article for the medcouple (https://en.wikipedia.org/wiki/Medcouple#Fast_algorithm) has an excellent exposition which includes a good description of Johnson + Mizoguchi, which is elided from the paper.

Jordi has provided the following description for the talk:

Identifying outliers is a delicate but common statistical problem. One of the most basic definitions of an outlier is given by familiar box and whisker plots, as defined by Tukey: an outlier is anything outside of the boxplot's whiskers. A problem with this simple definition is that it tends to give too many outliers for data that is very heavily skewed towards one side or the other.

A convenient, non-parametric way to handle skew data is the so-called adjusted boxplot, as defined by Hubert and Vandervieren [1], which adjusts the whisker lengths of the box plot according to the skewness of the distribution. This adjustment requires a nonparametric measurement of skewness that is robust to outliers. The statistic that Hubert and Vandervieren recommend is the medcouple, introduced by Brys, Hubert, and Struyf [2].
While the medcouple has many interesting statistical properties, it comes with a price: a naïve medcouple algorithm for a sample of size n is in the O(n^2) complexity class. Using Johnson and Mizoguchi's method for finding the median of a matrix with sorted rows and sorted columns [3], we can improve upon the naïve algorithm with a fast O(n log n) medcouple algorithm.
This talk will briefly touch upon these three papers and how they work together as a method for defining outliers for skew distributions. The presentation should appeal to those with a knowledge of basic statistics. The discussion of finding the median of a sorted matrix is interesting in its own right to anyone who cares about classical problems in computer science.
---
[1] M. Hubert; E. Vandervieren (2008). "An adjusted boxplot for skewed distributions". Computational Statistics and Data Analysis 52 (12): 5186–5201. doi:[masked]/j.csda[masked].
[2] G. Brys; M. Hubert; A. Struyf (November 2004). "A Robust Measure of Skewness". Journal of Computational and Graphical Statistics 13 (4): 996–1017. doi:[masked]/[masked]X12632.

[3] Donald B. Johnson; Tetsuo Mizoguchi (May 1978). "Selecting The Kth Element In X + Y And X1 + X2 +...+ Xm". SIAM Journal of Computing 7 (2): 147–153. doi:[masked]/[masked]