Skip to content

Details

For our November meetup, we will learn about semi-supervised learning, a fascinating side of machine learning. The general idea is to use a small amount of labeled data to train your algorithm and apply it to a larger unlabeled dataset.

We will also discuss about data spelunking, helping you really understand your datasets, with an applied example on the baby names dataset.

Agenda

6:00 - Doors open, get some free pizza and beers. Provided by Radialpoint

6:25 - Introduction

6:30 - Semi-supervised learning: Introduction and tutorial with Lukas Tencer, machine learning researcher. Since the knowledge is hidden in the whole dataset, not just in the labeled part, we can use it to discover the existing structure in the data and increase the precision of our classifier. Therefore more and more of the recent pattern recognition and machine learning approaches explore the possibilities of how unlabeled examples could help the learning process. We can imagine all the data forming a smooth surface in N-dimensional space,where only some of these examples are labeled. The knowledge of unlabeled examples helps us to discover the structure of this surface and better determine similarities between the data. Category of machine learning techniques which addresses this problem is called Semi-Supervised training. It is gaining significant popularity in recent years, especially with the growing trend in the amount of data (the Big Data problem),where acquiring labels for the whole dataset could be too expensive.

7:00 - Intermission

7:30 - How I discovered the baby names industry's dark secret: It's a secret so dark, I couldn't find anyone in the baby names industry who knew it. David Taylor is a data scientist, data visualizer, writer and blogger at www.prooffreader.com (http://www.prooffreader.com/). ('prooffreader' is misspelled: that's the joke)

8:00 - Continue to talk data and hang out with your MTL Data friends =)

Related topics

You may also like