September 6, 2012 · 6:30 PM
This location is shown only to members
In this session we will be discussing some of the issues, ideas, and challenges around Machine Learning and Hadoop from a data scientist perspective.
6.30 Welcome and networking around, pizza+beer
7pm talks start
"From square to round wheels... moving from batch to real-time machine learning" by Michael Cutler CTO at Tumra.
Big Data technologies like Map/Reduce and the tools that utilise it are inherently batch in nature - they start, process, and end in jobs that last anywhere from minutes to hours at a time. By the time a batch job has finished there is already a queue of ‘stationary data’ waiting to be processed in the next batch run. This approach has its limitations, if you rely on a batch-process to train a machine learning model it could be ‘too late’ to respond to rapid changes. Recently there is a clear trend towards processing streams of ‘moving data’, such that it is never at rest (until it is archived). In this presentation Michael will walk you through some of the challenges and techniques to implement real-time online machine learning algorithms. Rather than pontificate about the merits of these approaches Michael will give you access to a live demo to interact with! Michael is CTO at Tumra. Prior to joining Tumra, he was a senior researcher in the R&D labs for British Sky Broadcasting.
5 min break
"Machine Learning on Hadoop: Present and Future" by Josh Wills, Director Data Science @ Cloudera.
In this talk Josh will talk about industrial machine learning, machine learning and Hadoop, and things industry needs from academia, as well as some challenges and new things happening. Josh is the director of data science at Cloudera, and one of the main contributors to Cloudera’s most recent open source project, Crunch, a Java library that aims to make writing, testing, and running MapReduce pipelines easy, efficient, and even fun.
Prior to joining Cloudera, he was a software engineer at Google.
More beer and networking
9.30pm-ish Session ends