October Hadoop Meetup: Streaming analytics and approximation


Details
Dear HUG UK members,
I am pleased to announce our October meetup on 'Streaming analytics and approximation'.
This event, sponsored by Strata, will be at the TechHub@Campus.
Details below.
Sebastian
TIME:
Tuesday October 22nd 2013, Doors Open 6:30pm.
Presentations 7:00pm – 8:30pm.
LOCATION:
TechHub @ Campus
5 Bonhill St, London, EC2A 4BX
AGENDA:
Session 1: Apache Samza: Distributed Stream Processing with Kafka and YARN.
Speaker: Jakob Homan, Senior Software Engineer at LinkedIn.
Abstract: Samza is a new distributed stream processing framework developed at LinkedIn and recently incubated into the Apache Software Foundation. Built atop YARN, it provides fault tolerance, durability, scalability and even local state with a simple, Map-Reduce-like interface.
Short bio: Jakob Homan is a Senior Software Engineer at LinkedIn, an Apache Hadoop committer and PMC member and works on Samza full time.
Session 2: Storm at spider.io - Cleaning up fraudulent traffic on the internet
Speaker: Ashley Brown, Chief Architect at spider.io.
Abstract: This talk will be charting spider.io's journey from being a Storm early adopter, to their freeze of Storm releases and switch to batch processing only, to coming full circle and implementing new fraudulent traffic algorithms with Trident.
Short bio: Ashley Brown is the Chief Architect at spider.io. He has previously worked on quantum chemical modelling, pipeline inspection robots and a control system for newspaper presses. He has published papers on the use of speculative hardware optimisations to accelerate key kernels for scientific computations.
Session 3: Scaling by Cheating: Approximation, Sampling and Fault-friendliness for Scalable Big Learning
Speaker: Sean Owen, Director of Data Science at Cloudera.
Abstract: To keep analyzing more data, and faster, we need a secret weapon: cheating. In this brief survey, learn how you may be doing too much work in your analytics and learning processes, and how giving up a little accuracy can gain a lot of performance. With examples from Apache Hadoop, Mahout, and ML tools from Cloudera.
Short bio: Sean Owen is the Director of Data Science at Cloudera. Previously he founded Myrrix, a complete, real-time, scalable clustering and recommender system, evolved from Apache Mahout, which was acquired by Cloudera in July this year.

October Hadoop Meetup: Streaming analytics and approximation