addressalign-toparrow-leftarrow-rightbackbellblockcalendarcameraccwchatcheckchevron-downchevron-leftchevron-rightchevron-small-downchevron-small-leftchevron-small-rightchevron-small-upchevron-upcircle-with-checkcircle-with-crosscircle-with-pluscrosseditemptyheartfacebookfolderfullheartglobegmailgoogleimagesinstagramlinklocation-pinmagnifying-glassmailminusmoremuplabelShape 3 + Rectangle 1outlookpersonplusprice-ribbonImported LayersImported LayersImported Layersshieldstartickettrashtriangle-downtriangle-uptwitteruseryahoo

Data Science & Anomaly Detection at Scale

  • Jul 17, 2014 · 6:30 PM
  • This location is shown only to members

6.30pm Free beer+pizza, socialise around

7pm "Finding the needle in the haystack, or looking for unusual patterns in big data" by Adi Andrei, Data Scientist

A classical Data Science story, about identifying and learning unusual patterns in mountains of aircraft sensor data, that happened at NASA in the early 2000's. The project was successful and received many patents and awards, despite the fact that it was developed on machines that had less memory than the mobile phone in your pocket. The methodology developed then is still considered state of the art in Data Science today. It can be (and it is) applied in many other fields requiring unusual pattern discovery in large amount of data.

Adi Andrei is a senior data scientist who worked as a contractor for NASA, Unilever, Philips, British Gas and others.

"Anomaly Detection in IT Systems at Scale" by Tom Veasey & Stephen Dodson @Prelert

In this talk we'll describe some of the data characteristics which make anomaly detection for real world problems challenging and describe some of the techniques we use at Prelert for anomaly detection. As the complexity of IT systems and the quantity of data people gather increases, proactively managing the health and security of these systems requires increasingly sophisticated monitoring tools. Rule based approaches are either becoming unmanageable, or in need of augmentation, The complexity and scale of the data poses significant challenges. Recent techniques from the fields of Data Mining, such as sketch data structures, probabilistic suffix trees, random forests, robust estimation, fitting "fat-tailed" distributions, proper handling of heterogeneous data types, sequential Monte-Carlo and so on, are all useful for improving the quality and/or scalability of anomaly detection. 

Tom is Research Director at Prelert. Prior to working for Prelert Tom has worked as a consultant in a mathematical modelling, worked for a period on FX derivative pricing and risk management tools at Bloomberg LP. Tom holds a masters in physics from the University of Cambridge.

Stephen is founder and CTO at Prelert.  Stephen +15y of experience in enterprise systems. He holds a masters in mechanical engineering and a Ph.D. in computational methods from Imperial College, London alongside a CES from École Centrale de Lyon. His academic research focused on computation of large scattering problems using integral equation time domain methods.  

Beer Break + Community Update

"Mini-Workshop: A Gentle Introduction to Apache Spark and Clustering for Anomaly Detection" by Sean Owen,Director of Data Science @ Cloudera

There has been an explosion of interest in Apache Spark as a new, alternative computing paradigm for Hadoop. It offers something to interest data scientists of all stripes, from interactive REPL to distributed functional programming to implementations of standard machine learning techniques.

In fact, it promises big scalability improvements over MapReduce for iterative algorithms, like k-means clustering, which can be used to detect anomalous data in a huge data set, for example.

This session will walk through a complete example of anomaly detection using Apache Spark and it’s MLlib subproject, as applied to the well-known network intrusion detection data set from KDD Cup ‘99. It will impart a taste of Scala (Spark’s native language), Spark’s core concepts like RDDs, and usage of MLlib for k-means clustering, in real-time on a Hadoop cluster. It will also introduce the concept of k-means clustering and how a data scientist would iteratively improve clustering in a session with Spark.

Sean is Director of Data Science at Cloudera, based in London. Before Cloudera, he founded Myrrix Ltd, a company commercializing large-scale real-time recommender systems on Apache Hadoop. He has been a primary committer and VP for Apache Mahout, and co-author of Mahout in Action. Previously, Sean was a senior engineer at Google. He holds and MBA from the London Business School and a BA in Computer Science from Harvard.

9pm-ish Wrap up, beer at the pub around the corn

Thanks to O'Reilly Strata for their community support

Register for Strata Barcelona

Get 20% off with our code UGDSL20

Join or login to comment.

  • Seraphina A.

    Just to say - thanks for a great evening! Interesting talks. Wish There had been more time to mingle - arrived late from Brighton though. Looking forward to the next one...

    July 23, 2014

  • Manoj N.

    1 · July 20, 2014

  • George A.

    I've uploaded some videos below, the quality is not great but hopefully they are usable.

    Finding Atypical Patterns in Large Datasets - Adi Andrei
    (includes Carlos intro)

    Anomaly Detection in IT systems - Tom Veasey

    Anomaly Detection with Apache Spark - Sean Owen

    1 · July 19, 2014

  • Sally Z.

    Great talks at the meetup yesterday. Any chance we could find the slides online?

    2 · July 18, 2014

  • Joseph S.

    Great Talk. Nice location

    July 18, 2014

  • Seref A.

    Great talks, very nice to see people actually naming probability distributions to look at. I think we could have joined the event just to enjoy the view :)

    July 18, 2014

  • Ben W.

    Thanks for the meetup! Great talks - much more technical than those at other Hadoop meetups.

    1 · July 18, 2014

  • Narmada G.

    Very good meetup with a tightly focused agenda. Would definitely recommend questions, especially basic ones, being taken offline. Shame the security issue prevented some people from joining us. There should be some way of automatically sending the security email to people coming off the waitlist?

    July 18, 2014

  • André B.

    It seems that a lot of people in the main list of the Meetup event missed their spot because of the second registration on Eventbrite (as it was full yesterday).

    1 · July 17, 2014

    • Peter M.

      Yeah, got in anyway.

      July 18, 2014

  • Peter M.

    Nice location, fine talks.

    1 · July 18, 2014

  • BobTang


    July 17, 2014

  • Ann W.

    Disappointed to finally be properly reconnected to the world and discover I've missed the eventbrite signup and the event itself. Hoping this talk is recorded!

    1 · July 17, 2014

  • Carlos

    If you've registered your name/last name for the security desk AND RECEIVED A CONFIRMATION EMAIL FROM EVENTBRITE you are in the list

    If you haven't received a confirmation email from eventbrite please don't waste your time going to the venue; reception desk won't let you in due to building's strict EH&S and security restrictions.

    We are very sorry about all this but we can't change the building access restrictions.

    July 17, 2014

    • Emil V.

      So just to confirm, anyone who got a place via the wait list today or yesterday can't actually go?

      July 17, 2014

  • Alex D.

    strange it show that 2 spots available but you cannot do anything...

    July 17, 2014

  • Emil V.

    I got a place from the wait list. But eventbrite says the event is 'sold out'.

    I'd like some clarification on whether I can come or not?

    July 17, 2014

  • Suba

    Apologies need to cancel due to last minute commitments. By any chance will the session be recorded?

    July 17, 2014

  • Hugh L.

    Hi, Carlos. I signed up to this as soon as I could but I also can't register my name on the Eventbrite page as it says it is sold out. Hugh Lawson-Tancred

    July 17, 2014

  • Tony G.

    similar here -- I got a link just yesterday but it says sold out

    July 17, 2014

  • Andy H.

    Hi, Carlos. I've been signed up to this for weeks but I can't register my name on the Eventbrite page as it says it is sold out. Any advice? Cheers! Andy Hamflett

    July 17, 2014

  • Carlos

    FYI- The meetup is fully booked, and all RSVPs are closed.

    If you've received a link to register your name/last name for the security desk, please don't share it with others as you have to respect the order in the list!!! Be fair to your peers!

    If you haven't received a link to register please don't waste your time going to the venue; reception desk won't let you in due to building's strict EH&S and security restrictions.

    We are very sorry about all this but we can't change the building access restrictions.

    July 17, 2014

  • Antonio B.

    Thanks Carlos!

    July 16, 2014

  • Antonio B.

    Hi, I got a place now after having being on the waiting list. Is there a registration link for the event? Thanks!

    July 16, 2014

  • Richard B.

    Apparently there's extra security, so I just tried to register by hitting the link in Carlos' email he just sent out...this gave me a page saying I needed an invite to register...anyone else having difficulties? Anyone succeeded in registering?

    3 · July 14, 2014

    • Margaret

      Same as Meg, I'm now off waitlist but am afraid I may not get into building. Update would be much appreciated. Thanks

      1 · July 16, 2014

    • Rob

      Same situation as Meg and Margaret. Do we need to do additional registration?

      July 16, 2014

  • Dane W.

    Finally got off the waitlist but really sorry that I now can't make it. Have released my place.

    July 16, 2014

  • max Bitcoin solution for African remittance. Interested in big data visualization techniques to map mobile remittance data

    July 16, 2014

  • Fayimora F.

    Is this session going to be recorded? I want to attend but I'm graduating that day + I'm still on the waiting list :(

    2 · July 14, 2014

  • Martin G.

    I'm not going to be able to make it as I've got to babysit that evening.

    July 10, 2014

  • Iain M.

    Hi Folks, as usual this meeting looks to be fantastic and is already fully booked. Can I ask those of you who know they aren't going to be able to make it, and have secured a place to free it up so those on the wait list can be allocated places.


    2 · July 4, 2014

  • Hugo M.

    Unfortunately, I won't be able to attend,. Is a video of the presentations going to be available?

    July 3, 2014

  • Clifford M.

    I am a new startup with a huge amount of data which needs interrogating. Keen to see and hear how other people are handling big data queries.

    July 3, 2014

  • George

    Hi I am looking forward to the next meeting, as a BI Analyst I am often surrounded with many Legacy systems, therefore anything that can use slow machines etc is great for creating prototypes and completing feasibility studies where budgets are not enormous. :o)

    April 14, 2014

  • Carlos

    Adi- thx for adding this. We have Jonny and Stephen lined up to talk about 'anomaly detection' in our meetup in May... Maybe you can join as a speaker and share your experience at NASA; as discussed that is still state-of-the-art indeed!... There's lots of things going on sensors, IoT and anomalies too (I should share some on that)... OK let's plan this...

    March 28, 2014

    • Adi A.

      I would be happy to. I also know a couple of people who could talk on the same or similar subject (interesting patterns in large sensor data).

      April 5, 2014

  • Heather S.

    Hm, I've signed up but whether I can actually attend depends on the date and location. Let's see.

    March 31, 2014

People in this
Meetup are also in:

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy