Past Meetup

October Hadoop Meetup: Streaming analytics and approximation

This Meetup is past

180 people went

Location image of event venue


Dear HUG UK members,

I am pleased to announce our October meetup on 'Streaming analytics and approximation'.

This event, sponsored by Strata, will be at the TechHub@Campus.

Details below.



Tuesday October 22nd 2013, Doors Open 6:30pm.

Presentations 7:00pm – 8:30pm.


TechHub @ Campus

5 Bonhill St, London, EC2A 4BX


Session 1: Apache Samza: Distributed Stream Processing with Kafka and YARN.

Speaker: Jakob Homan, Senior Software Engineer at LinkedIn.

Abstract: Samza is a new distributed stream processing framework developed at LinkedIn and recently incubated into the Apache Software Foundation. Built atop YARN, it provides fault tolerance, durability, scalability and even local state with a simple, Map-Reduce-like interface.

Short bio: Jakob Homan is a Senior Software Engineer at LinkedIn, an Apache Hadoop committer and PMC member and works on Samza full time.

Session 2: Storm at - Cleaning up fraudulent traffic on the internet

Speaker: Ashley Brown, Chief Architect at

Abstract: This talk will be charting's journey from being a Storm early adopter, to their freeze of Storm releases and switch to batch processing only, to coming full circle and implementing new fraudulent traffic algorithms with Trident.

Short bio: Ashley Brown is the Chief Architect at He has previously worked on quantum chemical modelling, pipeline inspection robots and a control system for newspaper presses. He has published papers on the use of speculative hardware optimisations to accelerate key kernels for scientific computations.

Session 3: Scaling by Cheating: Approximation, Sampling and Fault-friendliness for Scalable Big Learning

Speaker: Sean Owen, Director of Data Science at Cloudera.

Abstract: To keep analyzing more data, and faster, we need a secret weapon: cheating. In this brief survey, learn how you may be doing too much work in your analytics and learning processes, and how giving up a little accuracy can gain a lot of performance. With examples from Apache Hadoop, Mahout, and ML tools from Cloudera.

Short bio: Sean Owen is the Director of Data Science at Cloudera. Previously he founded Myrrix, a complete, real-time, scalable clustering and recommender system, evolved from Apache Mahout, which was acquired by Cloudera in July this year.