Online Active Learning with Imbalanced Classes

Title: Online Active Learning with Imbalanced Classes

Speaker: Zahra Ferdowsi

Abstract:

Real-world machine learning systems are often embedded in large interactive systems involving experts in the loop. Once the classifier is trained on a pool of known examples, such systems classify a large number of new examples and present the experts with a ranked list of examples to review and verify. The experts often have limited time, are expensive, and are concerned primarily with finding positive instances. The interactive nature of such systems, together with limited labeling resources, high labeling costs, and the large number of unlabeled examples lend this problem well to the use of active learning techniques. Selecting the most informative examples to query from the experts could be very challenging since there is no instance selection strategy that consistently works better than others. In this talk, I will discuss the challenges of using these techniques to select the best examples and present a new online algorithm that switches between different candidate instance selection strategies for classification in imbalanced data sets.

About the Speaker:

Zahra is a PhD student at DePaul Univesity working on practical machine learning and active learning applications. She has experience in analytics across healthcare, real estate, and financial services industries. As a data scientist at Groupon, she has been working on risk assessment of the merchants and demand forecast for local businesses.

Join or login to comment.

  • Rob L.

    The slides for the talk have been posted. Thanks Zahra!

    1 · October 22, 2013

  • Stephan W.

    This is a great topic. We often run into the "imbalanced classes" situation in the medical field. Building a reasonable classifier can be challenging; and then measuring performance without introducing too much bias can be equally challenging. It'll be interesting to hear how this active learning approach ameliorates issues encountered in this situation.

    September 24, 2013

    • Simon H.

      A simple approach I have read about is to duplicate the positive examples in the dataset (or the less frequent class, as that may be the negative examples) such that you have equal numbers of both classes. However, this did not work when I tried it. I am hoping Zahra's technique bears more fruit for my problem domain (essay grading).

      October 15, 2013

  • Simon H.

    Very interesting talk Zahra. I love hearing talks that deviate from the traditional ML \ data science approaches, as those are the most interesting and informative, and this talk definitely met that criteria.

    We talked after the meetup, and I wanted to send you a link to this company, who are based in Chicago and apply machine learning to the healthcare field:

    http://methodcare.com/

    Please can you post the slides and any links to papers you have published, as I would like to study the approach in more depth as I may find some uses for it in my research.

    October 15, 2013

Our Sponsors

People in this
Meetup are also in:

Create a Meetup Group and meet new people

Get started Learn more
Allison

Meetup has allowed me to meet people I wouldn't have met naturally - they're totally different than me.

Allison, started Women's Adventure Travel

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy