PyData March Meetup @ Delivery Hero


Details
Dear all,
I’m delighted to announce that our first meetup in 2016 will be a NLP one.
The agenda is awesome:
• Matti Lyra: Improving classifier performance using topical ensembles
• Matthew Honnibal (spaCy.io): Sense2vec. Distributional similarity over tagged text
• Details about the upcoming PyData Berlin Conference 2016
• Socialising/Group Discussion
Many thanks to Delivery Hero for hosting us.
We're looking forward to meeting you all!
Liebe Grüße, Greetings,
Anne
-----
Matti Lyra: Improving classifier performance using topical ensembles
The traditional wisdom regarding ensemble models such as bagging is that the training data samples should be randomised as much as possible. I will present a new ensemble model that utilises topic modelling to guide the sampling, this leads to improved performance on a classification task.
Matti is a PhD candidate in Computational Linguistics. He has a Bachelor's in Computing and Artificial Intelligence from the University of Sussex and works as a machine learning consultant alongside his studies.
Matthew Honnibal: Sense2vec: Distributional similarity over tagged text
The word2vec family of models have become one of the standard ways of exploring large text samples. The algorithms map each word into a real-valued vector that represent its usage contexts, a useful approximation of its meaning. To analyse a text, you split it into tokens, retrieve each token's vector from the look up table, and compose them somehow — averaging is common.
We present a simple and practical extension to this family of models that lets you learn more precise vectors, and retrieve them reliably at run-time. Instead of splitting the string into single, unannotated words, we split the string into richer tokens, recognised by the tagger, entity recogniser and parsers implemented in the spaCy NLP library. This allows us to learn separate vectors for distinct noun and verb senses, e.g. we model (and reliably retrieve) two senses of "duck": the type of bird, and the action of crouching. We also learn vectors for non-compositional phrases, such as "open season" and "fair game".
We demonstrate the efficacy and scalability of the approach by training a model on all comments posted to Reddit in 2015. The demo is available at https://sense2vec.spacy.io . Associated code and further explanation are available at https://github.com/spacy-io/sense2vec and htttps://spacy.io/blog/sense2vec-with-spacy respectively.
Matthew Honnibal is the lead developer of the spaCy software. He studied linguistics as an undergrad, and never thought he'd be a programmer. By 2009 he had a PhD in computer science, and in 2014 he left academia to found a start-up devoted to making NLP more practical. He's from Sydney and lives in Berlin.

PyData March Meetup @ Delivery Hero