PyData #22 - Sponsored by SimilarWeb
Details
We would like to thank SimilarWeb for hosting our April meetup.
Please note that this meetup is limited in space, and we will be enforcing the attendance list.
PEOPLE NOT ON THE LIST WILL NOT BE ALLOWED TO PARTICIPATE.
Agenda for the meetup:
18:00 - 18:30 Gathering
18:30 - 18:40 A word from our host, similarweb
18:45 - 19:15 Deep Learning for Named Entity Recognition (Kfir Bar / Basis Technology)
19:15 - 19:45 From Spark to Elasticsearch and Back - Learning Large Scale Models for Content Recommendation (Sonya Liberman / Outbrain)
19:45 - 20:15 Shaky Ground (truth): Learning with Label Noise (Yaniv Katz / Similarweb)
ABSTRACTS:
###########################
Deep Learning for Named Entity Recognition (Kfir Bar)
Named Entity Recognition is one of the key tasks in commercial Natural Language Processing applications. Its objective is to identify named entity mentions, such as people, organizations, and locations, in running text. State-of-the-art approaches are purely data-driven, leveraging deep neural networks. In this talk, I will present a few of those works, followed by a description of our own deep NER implementation, based on TensorFlow. We'll look at accuracy, speed, and memory footprint, while comparing some of the best known deep architectures with a basic statistical approach.
########################
From Spark to Elasticsearch and Back - Learning Large Scale Models for Content Recommendation (Sonya Liberman)
Serving tens of billions of personalized recommendations a day under a latency of 30 milliseconds is a challenge. In this talk I'll share our algorithmic architecture, including its Spark-based offline layer, and its Elasticsearch-based serving layer, that enable running complex models under difficult scale constrains and shorten the cycle between research and production.
######################
Shaky Ground (truth): Learning with Label Noise (Yaniv Katz)
Labeled data containing incorrect labels, termed label noise, has gained much attention in machine learning research due to its adverse impact on supervised models. This effort has increased in recent years, as the usage of larger data sets, which are more prone to label noise, has become prevalent. To tackle this problem, studies have explored the sensitivity of the learning process to label noise and devised robust methodologies to overcome it. This talk covers basic concepts in label noise research and explores suggested approaches for overcoming its negative effects. It also showcases two practical examples of easy-to-use methods which were tested on training sets contaminated by label noise and by target value noise.
