Ted Dunning to discuss Online Super-fast and High Quality Clustering

Two time LA-HUG speaker Ted Dunning will join us September 25th to discuss Super-fast and High Quality Clustering.

 

Recent algorithmic developments [1] have enabled dramatic improvements in performance for clustering applications.  Previously, the workhorse clustering algorithm was k-means which scaled linearly with the desired number of clusters times the data size times the number of iterations required.  The number of iterations itself depended on the number of clusters and in map-reduce implementations such as in Mahout [2], the required iterative implementation is exceedingly painful.

These new algorithms require only a single pass over the data and each pass has a cost that is roughly O(log k) where k is the desired number of clusters.  The resulting implementation [3] which is being ported into Mahout has demonstrated some stunning speed.  In one test, a uni-processor threaded implementation demonstrated the ability to cluster data points in just 20 micro-seconds per data point.  Moreover, this algorithm is easily ported to map-reduce with essentially perfect linear scaling.  This implies we should be able to cluster hundreds of millions of data points in minutes on moderate sized cluster.  Even more exciting, these algorithms are online algorithms, so it is possible to build a real-time clustering engine that clusters data points as they arrive and never needs to look back at old data.

I will talk about the basic intuitions behind these algorithms, how they are implemented, their limitations and how to use them.  I will also talk about some of the very exciting practical implications of having a super-fast clustering algorithm.

[1] http://web.engr.oregonstate.edu/~shindler/papers/FastKMeans_nips11.pdf

[2] http://mahout.apache.org/

[3] https://github.com/tdunning/knn

Join or login to comment.

People in this
Meetup are also in:

Create a Meetup Group and meet new people

Get started Learn more
Henry

I decided to start Reno Motorcycle Riders Group because I wanted to be part of a group of people who enjoyed my passion... I was excited and nervous. Our group has grown by leaps and bounds. I never thought it would be this big.

Henry, started Reno Motorcycle Riders

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy