Large Scale Image Classification and Apache Spark for applied machine learning

Name: Large Scale Image Classification and Apache Spark for applied machine learning
Start: 2014-04-23T18:00:00+02:00
End: 2014-04-23T22:00:00+02:00
Location: Marktplaats/eBay Office Amsterdam

Hosted By

Friso van V. and Gijs M.

Large Scale Image Classification and Apache Spark for applied machine learning

Details

Ready for the third meetup in our series? This time we are gathering in Club Dauphine, which is kindly offered to us by eBay Classifieds Group / Marktplaats (https://www.marktplaats.nl/i/help/over-marktplaats/werk-bij.dot), who will also be sponsoring our food and beverages for the evening.

We will have two talks again. We are still in the process of confirming the first talk, but right now it looks like it could be a introduction to using Apache Spark (http://spark.apache.org/) for applied machine learning. For the second talk, we have confirmed Thomas Mensink who is working on machine learning and computer vision at the University of Amsterdam.

Agenda

• 18.00: Arrive, socialise, have a drink and eat

• 18.50: Short introduction by your humble organizers

• 19.00: Talk 1, by yours truly (i.e. Friso), organiser @ The Amsterdam Applied Machine Learning meetup group

Using Apache Spark for applied machine learning and other data tasks

The open source Apache Hadoop stack, including its MapReduce batch processing framework, has over the past few years more or less become the de facto standard for large scale data processing needs in commercial organisations. One of the draw backs of this solution is the fact that it is most efficient for single pass, batch data processing, because it synchronises to disk one or more times during a MapReduce program. For many machine learning and other data driven applications, this is a major performance bottleneck as many algorithms in applied machine learning require multiple passes over the same data. The Apache Spark project aims to address this problem by using aggregate cluster memory to store datasets allowing multiple iterations more suited for iterative algorithms and exploratory analysis.

In this talk we'll take a look at Apache Spark for exploratory analysis using Python and iPython notebook integration as well as implementing a iterative machine learning algorithm using the native Scala API for Apache Spark. (Note: I went beyond the built in MLlib and implemented something from scratch.)

• 19.45: short break

• 20.00: Talk 2, by Thomas Mensink, postdoctoral researcher at the University of Amsterdam

Large Scale Image Classification and Generalizing to New Classes

In this talk I'll present recent research on large scale image classification and how to learn classifiers for new classes at negligible cost.

First, I'll give a brief overview of the Fisher Vector (FV) image representation. The FV framework could be seen as a generalization of the popular Bag-of-Visual words approach, by taking into account more statistics about the distribution of the local descriptors in the image. This representation has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization.

Second, I'll discuss distance based classifiers, such as the k-Nearest Neigbours (kNN) and Nearest Class Means (NCM), since these methods can incorporate new classes and training images continuously over time at negligible cost. This is not possible with the popular one-vs-rest SVM approach, but is essential when dealing with real-life open-ended datasets. For the NCM classifier, which assigns an image to the class with the closest mean, we introduce a new metric learning approach based on multi-class logistic discrimination. During training we enforce that an image from a class is closer to its class mean than to any other class mean in the projected space. Experiments on the ImageNet 2010 challenge dataset, which contains over 1 million training images of thousand classes, show that, surprisingly, the NCM classifier compares favorably to the non-linear k-NN classifier. Moreover, the NCM performance is comparable to that of linear SVMs which obtain current state-of-the-art performance. Experimentally we also study the generalization performance to classes that were not used to learn the metrics and obtain surprisingly good results.

• 20.45: more drinks and social talks

• 21.30 or whenever the bar closes: everybody out! (out of the room, that is; the bar in Dauphine itself will be happy to serve you)

Events in Amsterdam, NL

Amsterdam Applied Machine Learning

See more events

Amsterdam Applied Machine Learning

public group

Wednesday, April 23, 2014
6:00 PM to 10:00 PM CEST

Marktplaats/eBay Office Amsterdam

Wibautstraat 224 · Amsterdam

Amsterdam Applied Machine Learning

public group

Large Scale Image Classification and Apache Spark for applied machine learning