Below is a standard dataset used to illustrate the fundamental weakness of the standard K-means clustering algorithm: that it assumes incoming data lie in convex globular clusters.
You see two subgroups of data - the two concentric rings - and so if we were to cluster this set we would want to recover those two rings as our clusters. But thats not what K-means with K = 2 will do if we apply it to this dataset. Below is the result of applying K-means with K = 2 to the dataset (on the left), and the correct separation given by spectral clustering - the topic of this journal club meeting - on the right.
Now this concentric ring set may be somewhat extreme, but we’re all adults here - do we really expect that the data we want to cluster in our various walks of life naturally lies in convex globular clusters (as is assumed by K-means)? Not likely.
So in this journal club meeting we’re going to learn about a clustering approach that throws out that assumption and does something different: spectral clustering. Spectral clustering clusters data points by treating them not as distinct convex globular clusters as in K-means but as connected subgraphs. This change in perspective allows separation of non-convex globular clusters like the concentric rings example above - something K-means just can’t do.
Here's the tutorial paper on spectral clustering that we'll read and discuss
You can also find a set of notes I wrote explaining spectral clustering - as well as a few cool MATLAB toys for playing around with different clustering algorithms (used to generate the images in this post) - below