The basic ideas of anomaly detection are simple. You build a model and you look for data points that don’t match that model. Building a practical anomaly detection system requires deal with practical details starting with algorithm selection, data flow architecture, anomaly alerting, user interfaces and visualizations. We will describe the major classes of anomaly detection systems and show how to build anomaly detection systems for:
a) rate shifts to determine when events such as web traffic, purchases or process progress beacons shift rate
b) topic spotting to determine when new topics appear in a content stream such as Twitter
c) network flow anomalies to determine when systems with defined inputs and outputs act strangely.
While describing how to solve these problems, we will describe how clustering, dimensionality reduction, and density estimation can be used in systems that adapt and learn about their environment and how these systems can tell you when something has changed.
This talk will reprise the content of my Strata presentation, but will include extra material that shows how compression equals truth and how anomaly detection can make databases faster among other sundry philosophical truths.
Ted Dunning - Chief Application Architect, MapR
Ted Dunning has been involved with a number of startups with the latest being MapR Technologies where he is Chief Application Architect working on advanced Hadoop-related technologies. He is also a PMC member for the Apache Zookeeper and Mahout projects. Opinionated about software and data-mining and passionate about open source, he is an active participant of Hadoop and related communities and loves helping projects get going with new technologies.