What we're about
Upcoming events (1)
Join us for a meetup full of data exploration and monitoring! Learn how to better know your data and extract insights from it :) Agenda: 18:00 Mingling and snacks 18:30 Opening words 18:40 Amir Pupko - Better and easier KPIs monitoring with new a/b testing platform 19:10 Short break 19:20 Meytal Avgil Tsadok - Seenopsis 19:50 Hagit Grushka-Cohen - Sampling policy effect on Anomaly Detection for Data Base Monitoring Talks are in English, and will be filmed and uploaded to the PyData YouTube channel. Thanks Soluto for hosting this event! ___________________________________________________________________________ Amir Pupko - Better and easier KPIs monitoring with new a/b testing platform A lot of different changes are being deployed to production on daily basis. How can we be sure we do not harm our main goals? Recently we’ve created a new monitoring platform that enables us to easily monitor every change to our product. In this talk we’ll share our process of delivering this new platform, choosing statistical methods for different use cases and integrating it to our product analytics DB. Amir started as a software developer at Soluto 7 years ago. Among them, 3 years ago he shifted to data science. Passionate about data driven methodologies, integrating statistics in our daily decision making process, and leveraging data to bring insights and value to customers. ___________________________________________________________________________ Meytal Avgil Tsadok - Seenopsis Link to Git: https://github.com/meytala/seenopsis The Why - There is always a need to “feel” the data by exploring the variables. Though essential, this task is sometimes repetitive, time consuming and mainly boring. What if there was a tool that centralizes the main important features of all variables in a dataset, helping to explore the dataset in a structured visualized approach? The What - Seenopsis is designed to help the everyday work of a data scientist, by centralizing the main important features of the different variables in a structured visualized approach. The How - The only required argument in seenopsis is the name of the dataset. Other arguments are optional. Meytal: Senior epidemiologist and data researcher in Clalit research institute. Adjunct Professor at McGill University. ___________________________________________________________________________ Hagit Grushka-Cohen - Sampling policy effect on Anomaly Detection for Data Base Monitoring Monitoring database activity is useful for identifying and preventing data breaches. Such database activity monitoring (DAM) systems use anomaly detection algorithms to alert security officers to possible infractions. However, the sheer number of transactions makes it impossible to track. Such solutions use manually crafted policies to decide which transactions to monitor and log. In this paper we describe a novel simulation method for user activity. We introduce events of change in the user transaction profile and assess the impact of sampling on the anomaly detection algorithm. We found that looking for anomalies in a fixed subset of the data using a static policy misses most of these events, since low risk users are ignored. A bayesian sampling policy identified 67% of the anomalies while sampling only 10% of the data, compared to a baseline of using all of the data. Hagit is a PhD student at Ben-Gurion University, the department of software and information systems, under the supervision of prof. Lior Rokach and prof. Bracha Shapira. Her PhD topic applied Machine Learning in the domain of cyber security. Hagit won the prestigious IBM fellowship award for her work on risk assessment and working towards automatic policy calibration twice. During her PhD Hagit collaborated with IBM Guardium and IBM, IBM Cyber Center of Excellence which led to several ML papers (including CIKM). Prior to starting her PhD Hagit was a project manager in Adama and a BI projects leader in Stanly works.