PyData @ Lusha

Name: PyData @ Lusha
Start: 2022-10-31T18:00:00+02:00
End: 2022-10-31T21:00:00+02:00
Location: Triangle Tower

Hosted By

Uri G. and Adi G.

Details

We would like to thank Lusha for hosting us PHYSICALLY

Agenda
18:00-18:30 Gathering and snacks
18:30-18:45 Welcome words from our host
18:45-19:15 Billion Scale Deduplication using ANN (Approximate Nearest Neighbours)| Idan Richman Goshen, Senior Data Scientist at Lusha
19:15-19:45 Network Anomaly Detection Using Transfer Learning Based on Auto-Encoders Loss Normalization| Dr. Aviv Yehezkel, Cynamics Co-Founder & CTO
19:45-20:00 A short break
20:00-20:30 Faster Pandas: Make your code run faster and consume less memory| Miki Tebeke, CEO 353solutions
20:30-21:00 TBD

============================================

Billion Scale Deduplication using ANN (Approximate Nearest Neighbours)| Idan Richman Goshen, Senior Data Scientist at Lusha
At Lusha we are dealing with contacts profiles, lots of contacts profiles. It is by nature messy, and a single entity can have several representations in this type of data. In addition to the time and money spent moving messy data through the various pipelines, it is difficult to search in, not to mention the valuable information lost in the process. It would be ideal if we could merge all records of the same entity, even if they differ slightly (“Alagra Jones”, “Alagra Smith-Jones”). Comparing combinations of all pairs is possible on a small scale, but impossible when dealing with billions of records.
A set of algorithms known as approximate nearest neighbours is becoming more popular for solving such challenges and allowing the use of text-embeddings and clustering at large scales.
This talk will offer a brief overview of ANN algorithms and demonstrate how we can apply them to get a reasonable size subset of candidates, which we can then pass into a classifier for a match/no-match outcome. I’ll demonstrate how we handle such a task at scale, how we evaluate the two steps, and the tools we use.

## ============================================

Network Anomaly Detection Using Transfer Learning Based on Auto-Encoders Loss Normalization| Dr. Aviv Yehezkel, Cynamics Co-Founder & CTO
The concept of "auto-encoder losses transfer learning". This approach normalizes auto-encoder losses in different model deployments, providing the ability to detect and classify network anomalies in a generalized way that is agnostic to the specific client. This talk is based on a recently presented paper in ACM CCS AISec 21'.

## ============================================

Faster Pandas: Make your code run faster and consume less memory| Miki Tebeke, CEO 353solutions.
We'll start by reviewing the rules of the optimization club and why you shouldn't optimize.
After that we'll see how you can measure speed and memory consumption and how to find the bottlenecks in your code. Finally we'll review some code samples and make them faster.

COVID-19 safety measures

Masks required

Event will be indoors

The event host is instituting the above safety measures for this event. Meetup is not responsible for ensuring, and will not independently verify, that these precautions are followed.

Events in Tel Aviv-Yafo, IL Machine Learning

Big Data Data Science Python Software Development

PyData Tel Aviv

See more events

PyData Tel Aviv

Monday, October 31, 2022
6:00 PM to 9:00 PM IST

Triangle Tower

Derech Menachem Begin 132 · Tel Aviv-Yafo

PyData Tel Aviv

public group

PyData @ Lusha