addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscontroller-playcrossdots-three-verticaleditemptyheartexporteye-with-lineeyefacebookfolderfullheartglobegmailgooglegroupshelp-with-circleimageimagesinstagramFill 1light-bulblinklocation-pinm-swarmSearchmailmessagesminusmoremuplabelShape 3 + Rectangle 1ShapeoutlookpersonJoin Group on CardStartprice-ribbonprintShapeShapeShapeShapeImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruserwarningyahoo

Outlier Selection and One-Class Classification

This month we have Jeroen Janssens from YPlan presenting "Outlier Selection and One-Class Classification". Jeroen's abstract and bio are below.

Abstract:

What is common in a terrorist attack, a forged painting, and a rotten apple? The answer is: all three are anomalies; they are real-world observations that deviate from what is considered to be normal. Detecting anomalies is of utmost importance because an undetected anomaly can be dangerous or expensive. A human domain expert may suffer from three cognitive limitations: fatigue, information overload, and emotional bias. The cognitive limitations will hamper the detection of anomalies. Outlier-selection and one-class classification algorithms are capable of automatically classifying data points as outliers in large amounts of data. During my Ph.D. I studied to what extent outlier-selection and one-class classification algorithms can support domain experts with real-world anomaly detection.

In this talk, I first introduce both the outlier selection and the one-class classification setting. Then, I present a novel algorithm called Stochastic Outlier Selection (SOS). The SOS algorithm computes for each data point an outlier probability. These probabilities are more intuitive than the unbounded outlier scores computed by existing outlier-selection algorithms. I have evaluated SOS on a variety of real-world and synthetic datasets, and compared it to four state-of-the-art outlier-selection algorithms. The results show that SOS has a superior performance while being more robust to data perturbations and parameter settings.

This talk is largely based on chapters 1, 2, and 4 of my Ph.D. thesis (see  https://github.com/jeroenjanssens/phd-thesis).In case you are just interested in the SOS algorithm itself, you can download the Technical Report, which corresponds to chapter 4 (see https://github.com/jeroenjanssens/sos). I will soon add a Python implementation of the SOS algorithm to the latter repository

Bio:

Jeroen Janssens is a senior data scientist at YPlan, tonight's going out app, where he's responsible for making event recommendations more personal. Jeroen holds an M.Sc. in Artificial Intelligence from Maastricht University and a Ph.D. in Machine Learning from Tilburg University. He is authoring a book called "Data Science at the Command-line", which will be published by O'Reilly in summer 2014. Jeroen enjoys biking the Brooklyn Bridge, building tools, and blogging at http://jeroenjanssens.com.

Join or login to comment.

  • Wei W.

    Is there a video and slides link for this talk?

    December 10, 2013

    • Wei W.

      Thanks a lot.

      December 11, 2013

  • Ronan M.

    Great talk. Any chance you can share the code for the US voting demo using coffeescript and d3. It was pretty cool.

    Thanks.

    2 · November 22, 2013

    • Jeroen J.

      Sure thing! As soon as I have changed the scale of the demo and cleaned up the code a bit, I'll put it online and post the url here.

      1 · November 22, 2013

    • Jeroen J.

      The D3 demo can be found towards the bottom of this blog post: http://jeroenjanssens...­

      2 · November 27, 2013

  • carlos r.

    Impressive work Jeroen! I am working on needle in haystack problems. We should talk more.
    Carlos

    November 25, 2013

  • A former member
    A former member

    Superb talk today Jeroen, many thanks for inviting me to it. And the algorithm's results look fantastic. Sorry I left without saying goodbye but you were pretty tied up with questions!

    I particularly enjoyed:
    a) the colour coding at the start and going btw the domain diagrams and confusion matrix
    b) the 6 point example and 5 matrices all on one side with shading coded -> made the method crystal clear
    c) the updating of the senators probabilities in real time

    Recommend you get invited back soon to talk about chapters 5 & 6!

    Cheers,

    November 23, 2013

  • Jeroen J.

    Recently, Laurens van der Maaten gave a Google TechTalk where he explained t-SNE, a non-linear dimensionality reduction technique he developed with Geoffrey Hinton. The design of SOS is very much inspired by t-SNE, as they both use the concept of affinity to quantify the relationship between data points. Of course I'll give a detailed explanation of the concept of affinity, but I thought I'd share this video (http://bit.ly/1hYvvDE) with you because t-SNE is a very interesting and useful technique on its on own.

    3 · November 10, 2013

    • Jeroen J.

      And if you happen to have domain knowledge which allows you to quantify the dissimilarity between categories, you could even use this for your own dissimilarity measure.

      November 22, 2013

    • Andreas R.

      OK, cool, thanks again, also for making the code available on github.

      November 23, 2013

  • Matthew L.

    Great talk! I was wondering how you set the parameters for each algorithm, (for example, perplexity for SOS), when you were comparing the different algorithms.

    November 22, 2013

    • Jeroen J.

      Thanks. With one-class classifiers the optimal parameter is found using 10-fold cross-validation, just as you could do with regular classifiers. However, since the presented outlier-selection algorithms are unsupervised, I simply applied them to all datasets with a whole range of parameter values, and report on the highest obtained AUC. The optimal parameter values for each outlier-selection algorithm and for each one-class dataset can be found in the appendix of the technical report.

      November 22, 2013

  • Nick G.

    Awesome presentation :)

    November 22, 2013

  • Ian W.

    Enjoyed the talk! As Rohan said, I am interested in seeing the code demo about the Congressional voting; not that I'm a domain expert, it would be fun to see the outliers as it correlates to public perception, how long they've been in office, and other, non-voting based evaluation.

    November 22, 2013

  • Paul T.

    Great meetup. Presentation was well-structured, informative, and entertaining.

    November 22, 2013

  • Evgeny B.

    I have got useful information about identification anomalies.

    November 22, 2013

  • David S.

    I showed up late but got the meat of it, a really good speaker and showing yet another in the endless variation of classifier algorithms that exist....in this case focused on low density data sets with subtle outlier classification attributes. Well done Jeroen.

    November 22, 2013

  • Simon B.

    Awesome speaker

    November 22, 2013

  • Ilya M.

    I really impressed with a new idea as well as it was presented. The most interest I found in "technical trick" how it was introduced asymmetric relation based on a symmetric base. Sure the idea will be extended on many different branches of ML

    November 21, 2013

  • Shira

    Unfortunately can't make it, freeing up my spot as well. Looking forward for the video.

    November 21, 2013

  • Steve R.

    Something came up last minute and will not be able to attend.. hope someone can take my place!

    November 21, 2013

  • John T.

    Can't make it, freeing up my spot for someone else.

    November 21, 2013

  • Dinesh K.

    I am a SAS/OSU Certified Business Data Mining Professional, graduated from OSU with MS in MIS and now living in NYC & trying to get back to my Data mining roots.

    November 21, 2013

  • Michael O. C.

    Is a video of this going to be posted? I doubt I can make it but I'd love to see the talk.

    November 13, 2013

  • Etzer

    I would like to attend

    1 · November 19, 2013

  • Srikanth M.

    I am interested and would like to join

    November 17, 2013

  • Jeroen J.

    As promised, I have added a Python implementation of the SOS algorithm to the Github repository. This implementation can also be used from the command-line. See https://github.com/jeroenjanssens/sos for an example.

    5 · November 13, 2013

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy