6.30pm Free beer+pizza, socialise around
7pm "Finding the needle in the haystack, or looking for unusual patterns in big data" by Adi Andrei, Data Scientist
A classical Data Science story, about identifying and learning unusual patterns in mountains of aircraft sensor data, that happened at NASA in the early 2000's. The project was successful and received many patents and awards, despite the fact that it was developed on machines that had less memory than the mobile phone in your pocket. The methodology developed then is still considered state of the art in Data Science today. It can be (and it is) applied in many other fields requiring unusual pattern discovery in large amount of data.
Adi Andrei is a senior data scientist who worked as a contractor for NASA, Unilever, Philips, British Gas and others.
"Anomaly Detection in IT Systems at Scale" by Tom Veasey & Stephen Dodson @Prelert
In this talk we'll describe some of the data characteristics which make anomaly detection for real world problems challenging and describe some of the techniques we use at Prelert for anomaly detection. As the complexity of IT systems and the quantity of data people gather increases, proactively managing the health and security of these systems requires increasingly sophisticated monitoring tools. Rule based approaches are either becoming unmanageable, or in need of augmentation, The complexity and scale of the data poses significant challenges. Recent techniques from the fields of Data Mining, such as sketch data structures, probabilistic suffix trees, random forests, robust estimation, fitting "fat-tailed" distributions, proper handling of heterogeneous data types, sequential Monte-Carlo and so on, are all useful for improving the quality and/or scalability of anomaly detection.
Tom is Research Director at Prelert. Prior to working for Prelert Tom has worked as a consultant in a mathematical modelling, worked for a period on FX derivative pricing and risk management tools at Bloomberg LP. Tom holds a masters in physics from the University of Cambridge.
Stephen is founder and CTO at Prelert. Stephen +15y of experience in enterprise systems. He holds a masters in mechanical engineering and a Ph.D. in computational methods from Imperial College, London alongside a CES from École Centrale de Lyon. His academic research focused on computation of large scattering problems using integral equation time domain methods.
Beer Break + Community Update
"Mini-Workshop: A Gentle Introduction to Apache Spark and Clustering for Anomaly Detection" by Sean Owen,Director of Data Science @ Cloudera
There has been an explosion of interest in Apache Spark as a new, alternative computing paradigm for Hadoop. It offers something to interest data scientists of all stripes, from interactive REPL to distributed functional programming to implementations of standard machine learning techniques.
In fact, it promises big scalability improvements over MapReduce for iterative algorithms, like k-means clustering, which can be used to detect anomalous data in a huge data set, for example.
This session will walk through a complete example of anomaly detection using Apache Spark and it’s MLlib subproject, as applied to the well-known network intrusion detection data set from KDD Cup ‘99. It will impart a taste of Scala (Spark’s native language), Spark’s core concepts like RDDs, and usage of MLlib for k-means clustering, in real-time on a Hadoop cluster. It will also introduce the concept of k-means clustering and how a data scientist would iteratively improve clustering in a session with Spark.
Sean is Director of Data Science at Cloudera, based in London. Before Cloudera, he founded Myrrix Ltd, a company commercializing large-scale real-time recommender systems on Apache Hadoop. He has been a primary committer and VP for Apache Mahout, and co-author of Mahout in Action. Previously, Sean was a senior engineer at Google. He holds and MBA from the London Business School and a BA in Computer Science from Harvard.
9pm-ish Wrap up, beer at the pub around the corn
Thanks to O'Reilly Strata for their community support
Register for Strata Barcelona http://oreil.ly/UGSTEU14
Get 20% off with our code UGDSL20