Exposing the deep secrets of Internet traffic


Details
Agenda
18:00 - 18:30 Welcome
18:30 - 19:10 Using Deep Learning for Anomaly Detection / David Gruzman
19:10 - 19:25 Beer & Food
19:25 - 20:00 The Challenge of Classifying Ad-Servers / Natalie Abel
Using Deep Learning for Anomaly Detection / David Gruzman
Our focus will be using deep learning for anomaly detection. We assume people to be familiar to some minimal extent with ideas of deep learning. We will cover following topics
-
Types of autoencoders
-
Autoencoders vs PCA.
-
How autoencoders can be applied for anomaly detection
-
We will do our best to apply autoencoders to SimilarWeb data (by their generous permission) and share results what works for us and what does not.
David Gruzman is a Hadoop and big data architect with more than 20 years of hands-on experience, specializing in design and implementation of scalable, high-performance distributed systems. Currently leading www.nestlogic.com (http://www.nestlogic.com/) which works on finding anomalies in big data sets.
The Challenge of Classifying Ad-Servers / Natalie Abel
Every day, Internet users around the world visit over 10 billion websites on their personal computers and mobile devices. A significant portion of these visits originates from ad network companies, who use rapidly changing ad-servers to redirect traffic from publisher websites to the advertisers themselves. One of the main challenges at SimilarWeb, where we measure and analyze the traffic of each website, is to classify traffic sources by type - specifically, traffic that originates from such ad networks.
In this talk, I will present our algorithm for classifying traffic that originates from ad servers, and explain why existing tools are not up to the task We will begin by describing the meaningful features generated from our data, which rely on a variety of data sources, including our panel of millions of web users. We will then discuss the models on which we base our classifier, such as logistic regression, kernel SVM and random forest.
Natalie Abel is Mathematics and Statistics graduate from Tel Aviv University. As a Data Scientist at Similarweb, she continuously utilizes the company’s data to understand how people around the world use the Internet.

Exposing the deep secrets of Internet traffic