DataTalks #24: Extracting Insights Based on Features Relationships

Details

Data Science Work Flow - Challenges and Best Practices.
๐—ฅ๐—ฒ๐—ด๐—ถ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐˜‚๐˜€๐—ถ๐—ป๐—ด ๐˜๐—ต๐—ฒ ๐—น๐—ถ๐—ป๐—ธ ๐—ฏ๐—ฒ๐—น๐—ผ๐˜„ ๐—ถ๐˜€ ๐—ณ๐—ฟ๐—ฒ๐—ฒ ๐—ฏ๐˜‚๐˜ ๐—บ๐—ฎ๐—ป๐—ฑ๐—ฎ๐˜๐—ผ๐—ฟ๐˜†!

๐—ฅ๐—ฒ๐—ด๐—ถ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป: http://bit.ly/DataTalks_24
๐—ฅ๐—ฒ๐—ด๐—ถ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐˜‚๐˜€๐—ถ๐—ป๐—ด ๐˜๐—ต๐—ฒ ๐—น๐—ถ๐—ป๐—ธ ๐—ถ๐˜€ ๐—ณ๐—ฟ๐—ฒ๐—ฒ ๐—ฏ๐˜‚๐˜ ๐—บ๐—ฎ๐—ป๐—ฑ๐—ฎ๐˜๐—ผ๐—ฟ๐˜†!

๐—”๐—ด๐—ฒ๐—ป๐—ฑ๐—ฎ:
๐Ÿ• 18:00 - 18:30 - Gathering, registration, snacks & mingling
๐Ÿ”ถ 18:30 - 19:15 - Bayesian Networks for Fraud Detection - Erez Timant
๐Ÿ”ด 19:15 - 19:35 - Real-time Session Conversion - Eitan Lifshits
๐Ÿ”ท 19:35 - 20:00 - Analyze Google Traffic at Scale - Tom Cohen

๐—ฅ๐—ฒ๐—ด๐—ถ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป: http://bit.ly/DataTalks_24
๐—ฅ๐—ฒ๐—ด๐—ถ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐˜‚๐˜€๐—ถ๐—ป๐—ด ๐˜๐—ต๐—ฒ ๐—น๐—ถ๐—ป๐—ธ ๐—ถ๐˜€ ๐—ณ๐—ฟ๐—ฒ๐—ฒ ๐—ฏ๐˜‚๐˜ ๐—บ๐—ฎ๐—ป๐—ฑ๐—ฎ๐˜๐—ผ๐—ฟ๐˜†!

๐—•๐—ฎ๐˜†๐—ฒ๐˜€๐—ถ๐—ฎ๐—ป ๐—ก๐—ฒ๐˜๐˜„๐—ผ๐—ฟ๐—ธ๐˜€ ๐—ณ๐—ผ๐—ฟ ๐—™๐—ฟ๐—ฎ๐˜‚๐—ฑ ๐——๐—ฒ๐˜๐—ฒ๐—ฐ๐˜๐—ถ๐—ผ๐—ป - ๐—˜๐—ฟ๐—ฒ๐˜‡ ๐—ง๐—ถ๐—บ๐—ฎ๐—ป๐˜, ๐——๐—ฎ๐˜๐—ฎ ๐—ฆ๐—ฐ๐—ถ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—ง๐—ฒ๐—ฐ๐—ต ๐—Ÿ๐—ฒ๐—ฎ๐—ฑ ๐—ฎ๐˜ ๐—”๐—ฝ๐—ฝ๐˜€๐—™๐—น๐˜†๐—ฒ๐—ฟ
AppsFlyer receives dozens of millions of App installs daily.

Unfortunately - many of these installs are fraudulent, and the ability to detect fraud becomes an ever-growing need in the mobile industry.

This is not merely a classification problem - as, many times, we are required to provide the reasoning behind our classification as well, which, in itself, introduces new challenges.

We use Bayesian networks for modeling the dependencies between different install features and then calculating the probability for a specific install to be fraudulent.

We propose and implement a variant over the Chi-Square test for testing conditional dependence between variables.

In this talk, I will explain what Bayesian networks are, how they work in theory, and what we had to do for them to work in practice, at a large scale.

๐‘๐ž๐š๐ฅ ๐ญ๐ข๐ฆ๐ž ๐’๐ž๐ฌ๐ฌ๐ข๐จ๐ง ๐‚๐จ๐ง๐ฏ๐ž๐ซ๐ฌ๐ข๐จ๐ง - ๐„๐ข๐ญ๐š๐ง ๐‹๐ข๐Ÿ๐ฌ๐ก๐ข๐ญ๐ฌ, ๐’๐ž๐ง๐ข๐จ๐ซ ๐ƒ๐š๐ญ๐š ๐’๐œ๐ข๐ž๐ง๐ญ๐ข๐ฌ๐ญ ๐š๐ญ ๐…๐ข๐ฏ๐ž๐ซ๐ซ
Fiverr is a marketplace platform is dealing with millions of visits per day.
While the main goal is that visits will end up with a purchase, more than 90% of them leave the marketplace without any (aka Users Leakage).
As we monitor user events on Fiverrโ€™s website, we developed an RNN (LSTM) model based on user events sequence to estimate the probability for purchase in real-time.
Weโ€™ll go over the dataset used, model architecture and deployment process using AWS SageMaker.

๐€๐ง๐š๐ฅ๐ฒ๐ณ๐ž ๐†๐จ๐จ๐ ๐ฅ๐ž ๐“๐ซ๐š๐Ÿ๐Ÿ๐ข๐œ ๐š๐ญ ๐’๐œ๐š๐ฅ๐ž - ๐“๐จ๐ฆ ๐‚๐จ๐ก๐ž๐ง, ๐ƒ๐š๐ญ๐š ๐’๐œ๐ข๐ž๐ง๐ญ๐ข๐ฌ๐ญ ๐š๐ญ ๐…๐ข๐ฏ๐ž๐ซ๐ซ
We at Fiverr are targeting Millions of keywords in Google on a continuously changing marketplace. With high bounce rates, Sending the user to the best landing page is highly important to increase the conversion rate. Top of the Funnel Personalization project goal is to provide the best landing experience and redirect users to the most relevant page suited for them.

This real-time model, hosted with AWS Sagemaker, is a contextual multi-armed bandit implementing a Neural-Linear model, combining FC neural network and a Thompson sampling over a Bayesian linear regression. It iteratively learns behavior and context, using explore-exploit methods (rewarded by achieving conversions), to recommend the best landing page for each user based on their data and context.

๐—ฅ๐—ฒ๐—ด๐—ถ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป: http://bit.ly/DataTalks_24
๐—ฅ๐—ฒ๐—ด๐—ถ๐˜€๐˜๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐˜‚๐˜€๐—ถ๐—ป๐—ด ๐˜๐—ต๐—ฒ ๐—น๐—ถ๐—ป๐—ธ ๐—ถ๐˜€ ๐—ณ๐—ฟ๐—ฒ๐—ฒ ๐—ฏ๐˜‚๐˜ ๐—บ๐—ฎ๐—ป๐—ฑ๐—ฎ๐˜๐—ผ๐—ฟ๐˜†!