Skip to content

Details

This will be our #6 amazing PyData event in Israel! It will include three interesting lectures by industry experts, mingling and sharing :)

• 18:00 - 18:30 - Gathering, snacks, mingling

• 18:30 - 18:40 - Opening words

• 18:40 - 20:40 - Three interesting talks about data science in Python

Visualizing High Dimensional Data (t-SNE), Gal Yona, Weizmann Institute

Many of us are used to think about data visualization as creating simple 2-variable plots to accompany our slides. In practice, this is hardly the case. I claim that in today's ML landscape - rich with complicated (and often uninterpretable) models, and usually very high dimensional data - good knowledge of data visualization techniques is a crucial tool for any serious practitioner. Like many other concepts in ML, while applying these techniques is usually easy (no more than 2-3 lines in Python) - truly understanding what's going on in the background, and coming up with useful applications, is a much more challenging task. In the first part of the talk, I'll give a short introduction to t-SNE - a beautiful algorithm and more importantly, the state of the art approach to visualizing high dimensional data. We'll talk about how it differs from simple dimensionality reduction techniques (like PCA) and give some intuition into why it usually performs very well. In the second part I'll mention two example applications that will hopefully spark your interest and creativity. The first application is arranging a huge collection of images by their visual similarity in an unsupervised approach. I will focus on some practical challenges I ran into while trying to efficiently implement this solution. The second application is revolves around visualizing the representations learnt by a neural network in order to better understand what is bases its predictions on. It shows how visualization can play a role in the task of "peeking" into the Deep Learning black box.

5 simple steps to create meaningful features from clickstream data (with Pandas), Shir Meir Lador, Bluevine

Predicting client financial behaviour is usually done by financial data sources - such as client financial history, bank and credit statements, etc. In this talk we will review a different type of data source for risk assessment - features engineered from clickstream data gathered by Mixpanel. We will discuss what kind of features can be generated from clickstream data and how to generate these features using Pandas, what is the benefit of these unconventional features, what are the different correlations of the features with user financial behaviour and how can you apply clickstream data analysis to your problems in general. In addition we will see how to be careful of data leakage and how to better understand our model's decision using model explanation tools like LIME.

Probabilistic cross device matching, netta shachar, Oracle (Crosswise)

Bridging the cross-device gap is probably the biggest challenge the ad-tech industry is facing today. Users spend more than 50% of their time on mobile devices, and it becomes crucial for advertiser to generate consistent message across all of the users' devices.

Crosswise, the leading provider of big data consumer identification solutions, has developed a probabilistic model that manages to match users' devices. By collecting activity data for billions of devices, and applying state-of-the-art machine-learning technology, crosswise is building a users’ identification graph spanning millions of users across the US.

Crosswise has developed a big-data analysis system capable of handling billions of activity points for billions of devices. The system matches the users activity signals across their devices, and is able to generate probabilistic matches with high confidence for millions of users.

The talks will be held in English.

Many thanks to Oracle for sponsoring this meetup!

Related topics

You may also like