ML at Scale: Fraud Prevention and Time-Series Clustering in Production


Details
Our 42th DataTalks meetup will be hosted by BioCatch, and we‘ll discuss about the cutting-edge strategies for building solid ML infrastructure and the power of large-scale time-series correlation.
Location: BioCatch offices, Azrieli Towers, square building, 31st floor, 132 Begin Road TLV.
Agenda:
🟣 17:30-18:00 – Gathering, Mingling, etc
🟣 18:00 - 18:45 – Realtime ML in Production: From Chaos to Confidence
Daniel Gordon, Data Science Team Lead at BioCatch
Managing ML-based scores in complex production environments requires robust infrastructure to ensure reliable, high-quality results while meeting strict SLAs. Challenges arise from collecting data from unreliable sources, varied data structures, and human interactions through our SDK. BioCatch, a leader in fraud prevention, must deliver accurate scores quickly, with a strong focus on optimizing alert and true positive rates.
We will present our multi-layered approach: clean feature engineering, tailored models for each sub-problem, rigorous testing, real-time monitoring and model debugging tools. Additionally, we utilize rule-based interventions for immediate adjustments during urgent scenarios, such as fraud attacks.
🟣 18:45 - 19:00 - Short break
🟣 19:00 - 19:45 - Correlating at scale: building time-series clustering and correlation service for big data
Alexander Shereshevsky, Machine Learning Architect at Anodot
Real-time similarity measurements can be challenging at a large scale in real-time. Usually, this problem is solved using approximation models calculated in advance (LSH-based) to find suitable candidates during the serving phase. I will present how Anodot uses LSH similarity approximation for large-scale time-series clustering and correlation and how Spark improves the calculation's performance.

ML at Scale: Fraud Prevention and Time-Series Clustering in Production