Large scale data processing with Spark and Hazelcast


Details
First talk: Data Processing at Scale with Apache Spark
Abstract: An introduction to some of the concepts which allows Spark to process very large amounts of data, in a very short amount of time. We’ll be looking at some of the basic design behind spark, and how to make it fly.
Bio: Andrew Newnham /Software Development Manager at Introspectus. Andrew has been working with data for more than 15 years, in that time he has seen data move from small simple systems, to the large complex ones which are typical in modern day organisations. A regular attendee at Spark Summit, he has been working with Spark for 3 years, 2 of those in production. Andrew leads the Introspectus Software Development Team, with his primary focus on improving performance from Introspectus’ Spark cluster.
Second talk: Fast Data Analytics in 2 Simple Steps
Abstract: Big data is good, but the economics of storing large volumes push this towards slow storage. That's a bad place to analyse the data due to the disk speed. So, what we'll do is take an input data stream of fast data passing through the pre-coded analytics in the memory of clustered nodes of a 3rd generation distributed stream processing engine and update in-memory cache of data. With nothing touching the disk, performance is lightning fast. What comes in as a stream of data is processed in-flight, stored to memory containers, and saved down to (write-only) disk for posterity. You will also learn all about directed acyclic graph (DAG) and why it is so powerful for Big Data processing. I'll walk you through the evolution of Big Data computing from sequential to DAG as well as other techniques such as SP/SC (Single Producer/Single Consumer), Cooperative Multithreading, Data Locality and In-Memory sources and sinks that power the third generation of Big Data processing. In this talk, I’ll show how easy it is to add an extremely fast, highly efficient stream-processing engine to a distributed in-memory cache. We will see how to analyse data already in-memory, to pull data from outside and to push results elsewhere. No fuss coding but blazing fast execution! All Java! The demo will be available to download afterwards from GitHub.
Bio: Rahul Gupta / Senior Solutions Architect at Hazelcast. (more details soon)
5:30 - Welcome, food, drinks, network
6:00 - First talk
6:45 - Second talk
Food, drinks, and giveaways will be provided!

Sponsors
Large scale data processing with Spark and Hazelcast