Ted Dunning to discuss Online Super-fast and High Quality Clustering

Two time LA-HUG speaker Ted Dunning will join us September 25th to discuss Super-fast and High Quality Clustering.

 

Recent algorithmic developments [1] have enabled dramatic improvements in performance for clustering applications.  Previously, the workhorse clustering algorithm was k-means which scaled linearly with the desired number of clusters times the data size times the number of iterations required.  The number of iterations itself depended on the number of clusters and in map-reduce implementations such as in Mahout [2], the required iterative implementation is exceedingly painful.

These new algorithms require only a single pass over the data and each pass has a cost that is roughly O(log k) where k is the desired number of clusters.  The resulting implementation [3] which is being ported into Mahout has demonstrated some stunning speed.  In one test, a uni-processor threaded implementation demonstrated the ability to cluster data points in just 20 micro-seconds per data point.  Moreover, this algorithm is easily ported to map-reduce with essentially perfect linear scaling.  This implies we should be able to cluster hundreds of millions of data points in minutes on moderate sized cluster.  Even more exciting, these algorithms are online algorithms, so it is possible to build a real-time clustering engine that clusters data points as they arrive and never needs to look back at old data.

I will talk about the basic intuitions behind these algorithms, how they are implemented, their limitations and how to use them.  I will also talk about some of the very exciting practical implications of having a super-fast clustering algorithm.

[1] http://web.engr.oregonstate.edu/~shindler/papers/FastKMeans_nips11.pdf

[2] http://mahout.apache.org/

[3] https://github.com/tdunning/knn

Join or login to comment.

Our Sponsors

  • OReilly

    O'Reilly Media is offering books, & Conf ticket

People in this
Meetup are also in:

Sometimes the best Meetup Group is the one you start

Get started Learn more
Bill

I started the group because there wasn't any other type of group like this. I've met some great folks in the group who have become close friends and have also met some amazing business owners.

Bill, started New York City Gay Craft Beer Lovers

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy