PyData @ Appsflyer
Details
We would like to thank Appsflyer for hosting us PHYSICALLY
Agenda
18:00-18:30 Gathering and snacks
18:30-18:45 Welcome words from our host
18:45-19:15 How to Data Science with Minimal Data?| Jeffrey Said, Data Scientist at AppsFlyer
19:15-19:45 Label Shift Adaptation – a Common Problem with a Surprising Solution | Oz Livneh, PhD Student (BIU) & Consultant
19:45-20:00 A short break
20:00-20:30 Correlating at scale - building time-series clustering and correlation service for big data| Alexander Shereshevsky, ML Architect at Anodot
RSVP now to secure your spot!
Maskit 14 st., ground floor, AppsFlyer Space
===========================================
How to Data Science with Minimal Data?| Jeffrey Said, Data Scientist at AppsFlyer
As a data scientist, ideally, you have access to a vast amount of data, and a wide variety of features.
But what if you only have limited data, and even more limited labels?
What evaluation metrics to use and how to understand them?
How to incorporate domain knowledge?
How to use statistical estimations on your data?
In this talk we’ll show how we, at AppsFlyer, faced these issues in the pursuit for accurate mobile attribution, and discuss what you can learn from our experience.
## ===========================================
Label Shift Adaptation – a Common Problem with a Surprising Solution | Oz Livneh, PhD Student (BIU) & Consultant
Label shift is very common – when the label distribution changes between training -> test (or production), damaging performance. For example, [p(cat), p(dog)] = [40%, 60%] (train) -> [80%, 20%] (test).
Common practices for label shift adaptation are problematic, but there exists a surprisingly effective & simple solution! I’ll present my PhD research from last year.
## ===========================================
Correlating at scale - building time-series clustering and correlation service for big data| Alexander Shereshevsky, ML Architect at Anodot
Real-time similarity measurements can be challenging at a large scale in real-time. Usually, this problem is solved using approximation models calculated in advance for finding suitable candidates during the serving phase. We will present how Anodot uses LSH similarity approximation for metrics clustering and correlation, Spark usage in our data pipelines, and explain the technical challenges of the migration process.
***
