This month we have Jeroen Janssens from YPlan presenting "Outlier Selection and One-Class Classification". Jeroen's abstract and bio are below.
What is common in a terrorist attack, a forged painting, and a rotten apple? The answer is: all three are anomalies; they are real-world observations that deviate from what is considered to be normal. Detecting anomalies is of utmost importance because an undetected anomaly can be dangerous or expensive. A human domain expert may suffer from three cognitive limitations: fatigue, information overload, and emotional bias. The cognitive limitations will hamper the detection of anomalies. Outlier-selection and one-class classification algorithms are capable of automatically classifying data points as outliers in large amounts of data. During my Ph.D. I studied to what extent outlier-selection and one-class classification algorithms can support domain experts with real-world anomaly detection.
In this talk, I first introduce both the outlier selection and the one-class classification setting. Then, I present a novel algorithm called Stochastic Outlier Selection (SOS). The SOS algorithm computes for each data point an outlier probability. These probabilities are more intuitive than the unbounded outlier scores computed by existing outlier-selection algorithms. I have evaluated SOS on a variety of real-world and synthetic datasets, and compared it to four state-of-the-art outlier-selection algorithms. The results show that SOS has a superior performance while being more robust to data perturbations and parameter settings.
This talk is largely based on chapters 1, 2, and 4 of my Ph.D. thesis (see https://github.com/jeroenjanssens/phd-thesis ).In case you are just interested in the SOS algorithm itself, you can download the Technical Report, which corresponds to chapter 4 (see https://github.com/jeroenjanssens/sos ). I will soon add a Python implementation of the SOS algorithm to the latter repository
Jeroen Janssens is a senior data scientist at YPlan, tonight's going out app, where he's responsible for making event recommendations more personal. Jeroen holds an M.Sc. in Artificial Intelligence from Maastricht University and a Ph.D. in Machine Learning from Tilburg University. He is authoring a book called "Data Science at the Command-line", which will be published by O'Reilly in summer 2014. Jeroen enjoys biking the Brooklyn Bridge, building tools, and blogging at http://jeroenjanssens.com (http://jeroenjanssens.com/).