October HUG UK MeetUp


Details
We are pleased to announce our next HUGUK MeetUp on October 15th, at the Expedia (http://www.expedia.co.uk/) offices in Angel.
We have 3 great presentations lined up for the evening:
Today’s reality Hadoop with Spark- How to select the best Data Science approach when using Big Data Platforms and Technologies?
Martin Oberhuber and Eliano Marques, Senior Data Scientists @Think Big International
In this talk Think Big International Lead Data Scientists will discuss the options that exist today for engineering and data science teams aiming to use big data patterns to solve new business problems. With the enterprise adoption of the Hadoop ecosystem and the emerging momentum of open source projects like Spark it is becoming mandatory to have an approach that solves for business results but remains flexible to adapt and change with the open source market.
Oryx 2: Lambda architecture on Spark, Kafka for real-time large scale ML
Sean Owen – Director of Data Science @Cloudera
Building machine learning models is all well and good, but how do they get productionized into a service? It's a long way from a Python script on a laptop, to a fault-tolerant system that learns continuously, serves thousands of queries per second, and scales to terabytes. The confederation of open source technologies we know as Hadoop now offers data scientists the raw materials from which to assemble an answer: the means to build models but also ingest data and serve queries, at scale.
This short talk will introduce Oryx 2, a blueprint for building this type of service on Hadoop technologies. It will survey the problem and the standard technologies and ideas that Oryx 2 combines: Apache Spark, Kafka, HDFS, the lambda architecture, PMML, REST APIs. The talk will touch on a key use case for this architecture -- recommendation engines.
Streaming Dataflow with Apache Flink
Ufuk Celebi - PMC member at Apache Flink and co-founder and software engineer at data Artisans
In this talk about Apache Flink we will touch on three main things, an introductory look at Flink, a look under the hood and a demo.
-
In the introduction we will briefly look at the history of Flink and then go on to the API and different use cases. Here we will also see how it can be deployed in practice and what some of the pitfalls in a cluster setting can be.
-
In the second section we will look at the streaming execution engine that lies at the heart of Flink. Here we will see what makes it tick and also what distinguishes it from other approaches, such as the mini-batch execution model.
-
In the final section we will see a live demo of a fault-tolerant streaming job that performs analysis of the wikipedia edit-stream.

October HUG UK MeetUp