*** Please note unusual day of week of this event ***
This event is not to be missed! Ted Dunning, Chief Application Architect at MapR, will be giving a talk on anomaly detection. Ted is an amazing speaker. If there is one talk you don't want to miss this year, this would be it. (And we have some seriously good speakers and topics--that's how awesome Ted is!)
Registration is open to all DAML members. Due to limited space, we are disabling guests for this event. Friends are welcome, they'll just need to register for the group first.
This event will be hosted on the Microsoft main campus, Building 27. See map at the bottom of the page. Food is generously sponsored by MapR.
Arrive early for food. The talk will start around 6:30pm. There will be plenty of time for questions after.
Opening Talk: AUC - at what cost(s)?
Alex Korbonits - Data Scientist, Remitly
AUC is and has been an extremely powerful lens through which machine learning practitioners have been able to evaluate and compare model performance. Is the phrase “my curve is better than your curve” the right threshold for publishing a new paper or pushing a new model into production? In this talk, I will demonstrate the ways in which we at Remitly are thinking outside the box (and the area under the curve) to challenge whether or not AUC is the right metric for a range of applications. Price and cost are fundamental components of economic modeling, and are quintessential aspects of an economist’s education and economic way of thinking. These are foreign concepts for many machine learning practitioners. Remitly’s Data Science team manages and thinks deeply about a number of classification tasks such as risk management and fraud detection. For a number of these tasks, misclassification is extremely costly compared to the gains of a correct classification. We are willing to sacrifice AUC in order to incorporate costs of classification and misclassification into our loss functions. By incorporating the notion of “indifference curves” (i.e., level sets), we show that by choosing models whose ROC curves cross our indifference curve thresholds, we can aim for models that give us the best bang for our buck.
Alex Korbonits is a Data Scientist at Remitly, Inc., where he works extensively on feature extraction and putting machine learning models into production. Outside of work, he loves Kaggle competitions, is diving deep into topological data analysis, and is exploring machine learning on GPUs. Alex is a graduate of the University of Chicago with degrees in Mathematics and Economics.
Main Talk: How to Find What You Didn't Know to Look For, Practical Anomaly Detection
Ted Dunning - Chief Application Architect, MapR
Anomaly detection is the art of automating surprise. To do this, we have to be able to define what we mean by normal and recognize what it means to be different from that. The basic ideas of anomaly detection are simple. You build a model and you look for data points that don’t match that model. The mathematical underpinnings of this can be quite daunting, but modern approaches provide ways to solve the problem in many common situations.
We will describe these modern approaches with particular emphasis on several real use-cases including:
rate shifts to determine when events such as web traffic, purchases or process progress beacons shift rate.
time series generated by machines or biomedical measurements.
topic spotting to determine when new topics appear in a content stream such as Twitter.
network flow anomalies to determine when systems with defined inputs and outputs act strangely.
In building a practical anomaly detection system you have to deal with practical details starting with algorithm selection, data flow architecture, anomaly alerting, user interfaces and visualizations.
We will show how to deal with each of these aspects of the problem with an emphasis on realistic system design.
Ted Dunning is Chief Application Architect at MapR Technologies and committer and PMC member of the Apache Mahout, Apache ZooKeeper, and Apache Drill projects . Ted has been very active in mentoring new Apache projects and is currently serving as vice president of incubation for the Apache Software Foundation . Ted was the chief architect behind the MusicMatch (now Yahoo Music) and Veoh recommendation systems. He built fraud detection systems for ID Analytics (LifeLock) and he has 24 patents issued to date and a dozen pending. Ted has a PhD in computing science from the University of Sheffield. When he’s not doing data science, he plays guitar and mandolin. He also bought the beer at the first Hadoop user group meeting.
Microsoft, a company that needs no introduction, is sponsoring space this month. DAML is supported in part by Microsoft's Data Science User Group Program.
MapR Technologies (https://www.mapr.com/) enables organizations to create disruptive advantage and long-term value from their data with the industry’s only Converged Data Platform, which delivers distributed processing, real-time analytics, and enterprise-grade requirements across cloud and on-premise environments–while leveraging the significant ongoing development in open source technologies including Spark and Hadoop. MapR ensures customer success through world-class professional services and with free on-demand training that over 50,000 developers, data analysts and administrators have used to close the big data skills gap. Connect with MapR on LinkedIn (https://www.linkedin.com/company/mapr-technologies), and Twitter (https://twitter.com/mapr).
The address is Room 1810 (Olympic), Microsoft Bldg 27,[masked]th Pl NE, Redmond, WA 98052. The building is circled in red in the picture below.
Most people will probably be coming from west of the campus. To get there:
Take 520 East to NE 40th St in Redmond
Turn right onto NE 40th St
Turn right onto 156th Ave NE
Use the second from the left lane to turn left onto NE 36th St.
Follow signs to Bldg 27. (See map below)
Registration is open on meetup.com. This event is free and space will be available on a first-come, first-served basis.