Skip to content

October Hadoop Meetup: Streaming analytics and approximation

Photo of Sebastian Spiegler
Hosted By
Sebastian S.
October Hadoop Meetup: Streaming analytics and approximation

Details

Dear HUG UK members,

I am pleased to announce our October meetup on 'Streaming analytics and approximation'.

This event, sponsored by Strata, will be at the TechHub@Campus.

Details below.

Sebastian

TIME:

Tuesday October 22nd 2013, Doors Open 6:30pm.

Presentations 7:00pm – 8:30pm.

LOCATION:

TechHub @ Campus

5 Bonhill St, London, EC2A 4BX

AGENDA:

Session 1: Apache Samza: Distributed Stream Processing with Kafka and YARN.

Speaker: Jakob Homan, Senior Software Engineer at LinkedIn.

Abstract: Samza is a new distributed stream processing framework developed at LinkedIn and recently incubated into the Apache Software Foundation. Built atop YARN, it provides fault tolerance, durability, scalability and even local state with a simple, Map-Reduce-like interface.

Short bio: Jakob Homan is a Senior Software Engineer at LinkedIn, an Apache Hadoop committer and PMC member and works on Samza full time.

Session 2: Storm at spider.io - Cleaning up fraudulent traffic on the internet

Speaker: Ashley Brown, Chief Architect at spider.io.

Abstract: This talk will be charting spider.io's journey from being a Storm early adopter, to their freeze of Storm releases and switch to batch processing only, to coming full circle and implementing new fraudulent traffic algorithms with Trident.

Short bio: Ashley Brown is the Chief Architect at spider.io. He has previously worked on quantum chemical modelling, pipeline inspection robots and a control system for newspaper presses. He has published papers on the use of speculative hardware optimisations to accelerate key kernels for scientific computations.

Session 3: Scaling by Cheating: Approximation, Sampling and Fault-friendliness for Scalable Big Learning

Speaker: Sean Owen, Director of Data Science at Cloudera.

Abstract: To keep analyzing more data, and faster, we need a secret weapon: cheating. In this brief survey, learn how you may be doing too much work in your analytics and learning processes, and how giving up a little accuracy can gain a lot of performance. With examples from Apache Hadoop, Mahout, and ML tools from Cloudera.

Short bio: Sean Owen is the Director of Data Science at Cloudera. Previously he founded Myrrix, a complete, real-time, scalable clustering and recommender system, evolved from Apache Mahout, which was acquired by Cloudera in July this year.

Photo of AI Users Group UK group
AI Users Group UK
See more events
Techhub @ Campus
4-5 Bonhill St · London EC2A 4BX