DSPT#65 - The Journal Club: Automatic Model Monitoring for Data Streams (Porto)

Details

For this meetup we are going to shake things up a bit! Join us at Portucalense University for the next DSPT Porto Meetup, Jornal club style! The focus will be the recent preprint by Fabio Pinto, Marco Sampaio and Pedro Bizarro about model monitoring for data streams (https://arxiv.org/abs/1908.04240). This meetup will have in the first part a presentation of the preprint by Marco Sampaio, followed by a debate in the second part. The debate will count on a group of experts in the field.
If you want to be part of the discussion make sure you read the preprint ahead of the event!

=== SCHEDULE ===
The preliminary agenda for the meetup is the following:
• 18:30-19:00: Welcome and get together
• 19:00-19:30: Paper presentation: "Automatic Model Monitoring for Data Streams"; by Marco Sampaio – Feedzai.
• 19:40-19:45: Group photo
• 19:45-20:00: Networking / Coffee Break
• 20:00-20:45: Debate with Marco Sampaio (Feedzai), Fábio Pinto (Automaise), Bruno Miguel Veloso (UPT), Pedro Rodrigues (CINTESIS)
• 20:50: Closing, hanging out and some beers
• 21:00: Dinner is optional but it might be an excellent opportunity for networking (register here: http://bit.ly/dspt66_dinner)

This meetup is sponsored by Losch (https://www.losch.lu/fr/accueil) and Novo Banco (https://www.novobanco.pt/site/). The coffee break is sponsored by Portucalense University. Thank you for your support!
Do you want to be a sponsor in future meetups? Please contact us to
[masked]
See you there!
-------------------------------------

Title: Automatic Model Monitoring for Data Streams

Source: https://arxiv.org/abs/1908.04240

Blog post: https://medium.com/feedzaitech/ml-powered-automatic-model-monitoring-d1841efa0ba8

Speaker: Marco Sampaio

Abstract: Detecting concept drift is a well known problem that affects production systems. However, two important issues that are frequently not addressed in the literature are 1) the detection of drift when the labels are not immediately available; and 2) the automatic generation of explanations to identify possible causes for the drift. For example, a fraud detection model in online payments could show a drift due to a hot sale item (with an increase in false positives) or due to a true fraud attack (with an increase in false negatives) before labels are available. In this paper we propose SAMM, an automatic model monitoring system for data streams. SAMM detects concept drift using a time and space efficient unsupervised streaming algorithm and it generates alarm reports with a summary of the events and features that are important to explain it. SAMM was evaluated in five real world fraud detection datasets, each spanning periods up to eight months and totaling more than 22 million online transactions. We evaluated SAMM using human feedback from domain experts, by sending them 100 reports generated by the system. Our results show that SAMM is able to detect anomalous events in a model life cycle that are considered useful by the domain experts. Given these results, SAMM will be rolled out in a next version of Feedzai's Fraud Detection solution.

Short-bio: Marco Sampaio is a Research Data Scientist at Feedzai. Originally trained as a theoretical physicist, he got his bachelor degree at the University of Porto and graduated from his Master and PhD degrees at the University of Cambridge. Before embracing the world of Data Science in industry, he worked on topics as diverse as cosmological models, black hole physics and theoretical particle physics, with some time also spent at CERN. Currently he’s working on machine learning algorithmic solutions in data streaming scenarios to fight fraud. Marco is thoroughly addicted to research, problem solving and equations, so his favourite workplace joke is to throw pdf files with equations in slack rooms to confuse people. He also enjoys surfing (mostly attempting), running, and playing with his nephews (often both running and playing with them).