If you can, it's worth tracking down the cited article by Johnson and Mizoguchi that describes a selection algorithm used in the medcouple algorithm. Unfortunately it does not appear to be freely available on the web. However, the wikipedia article for the medcouple (https://en.wikipedia.org/wiki/Medcouple#Fast_algorithm) has an excellent exposition which includes a good description of Johnson + Mizoguchi, which is elided from the paper.
Jordi has provided the following description for the talk:
Identifying outliers is a delicate but common statistical problem. One of the most basic definitions of an outlier is given by familiar box and whisker plots, as defined by Tukey: an outlier is anything outside of the boxplot's whiskers. A problem with this simple definition is that it tends to give too many outliers for data that is very heavily skewed towards one side or the other.
A convenient, non-parametric way to handle skew data is the so-called adjusted boxplot, as defined by Hubert and Vandervieren [1], which adjusts the whisker lengths of the box plot according to the skewness of the distribution. This adjustment requires a nonparametric measurement of skewness that is robust to outliers. The statistic that Hubert and Vandervieren recommend is the medcouple, introduced by Brys, Hubert, and Struyf [2].
While the medcouple has many interesting statistical properties, it comes with a price: a naïve medcouple algorithm for a sample of size n is in the O(n^2) complexity class. Using Johnson and Mizoguchi's method for finding the median of a matrix with sorted rows and sorted columns [3], we can improve upon the naïve algorithm with a fast O(n log n) medcouple algorithm.
This talk will briefly touch upon these three papers and how they work together as a method for defining outliers for skew distributions. The presentation should appeal to those with a knowledge of basic statistics. The discussion of finding the median of a sorted matrix is interesting in its own right to anyone who cares about classical problems in computer science.