PyData TLV meetup #10
Details
This will be our #10 amazing PyData event in Israel! As usual, it will include great lectures by industry experts, mingling and sharing :) All lectures will be held in English.
Many thanks to Kenshoo for sponsoring this event!
Schedule
• 18:00 - 18:30 - Gathering, snacks, mingling
• 18:30 - 18:40 - Opening words
• 18:40 - 20:20 - Three Lectures
The talks:
Speaker: Irena Grabovitch-Zuyev
Title: Analyzing Machine Generated Mail
Abstract: Web Mail has grown significantly in the last decade, with 95% of its traffic being generated by automated scripts or “machines”. In this reality of an ever growing mailbox, you, the email user, rightfully expect your email provider to do a good job in organizing your electronic life.
In this talk we will discuss this challenge, and will present a classification algorithm for message clusters, developed by my team and myself at Yahoo research. The clusters of the messages are generated by a new clustering method, leveraging the HTML structure.
The classification algorithm was evaluated on a large scale experiment carried over real Yahoo mail traffic and achieved precision and recall rates between 85%-90%. It has been deployed in production in Yahoo mail backend and, as we speak, is sorting the messages in your Yahoo inbox.
About Irena: Irena Grabovitch-Zuyev is a Senior Research Engineer in Yahoo Research (Oath). She received her bachelor's degree and a master's degree from the Technion. She is a past Google Anita Borg scholar, who strongly believes that increasing women’s representation in the tech is essential. After having several papers accepted to top conferences and filling a few patents, she is confidant that her 3 most successful projects are her kids.
Irena's graduate studies focused on Information Retrieval in Social Networks. In her work in Yahoo, she is mainly working on Automatic Extraction and Classification using machine learning algorithms.
Speaker: Doron Kukliansky
Title: Data Driven Video Creation
Abstract: In this talk we will discuss a hackathon project in which we attempted to generate new episodes of The Simpsons, using data science tool. We will see the general approach, the data we had, but more importantly, the data we hadn't and how we compensated for it. We will also deep dive into two technical problems we encountered during the project and are of general interest:
-
The first is speaker recognition, for which we'll discuss the MFCC features and how they can be used for classification.
-
The second is semantic sentence similarity, for which we'll discuss the Word Mover's Distance, its origin and usage.
- Prior familiarity with The Simpsons isn't necessary but is an advantage.
Speaker: Zach Moshe
Title: Simpsons Detector
Abstract: This talk will discuss how to train a Convolutional Neural Network to detect the main four characters of the Simpsons. We'll go over the main issues in such a project:
-
Lack of tagged data, but plenty of images in Google searches, which led me to generate the training set by myself.
-
Transfer Learning: using other pre-trained networks as a jump-start in training our model.
The talk will be followed by some Keras code snippets and highlights from videos that were tagged by the model.
