Deenar Toraskar: Applying the Lambda Architecture with Spark/Spark Streaming


Details
IMPORTANT Please register at SkillsMatter:
https://skillsmatter.com/meetups/7114-maximize-your-spark-applying-the-lambda-architecture-with-spark-spark-streaming (https://skillsmatter.com/meetups/7114-maximize-your-spark-applying-the-lambda-architecture-with-spark-spark-streaming)
Applying the Lambda Architecture with Spark/Spark Streaming
with
Deenar Toraskar
Apache Spark (http://spark.apache.org/) now comes with all the tools, libraries and connectors required to build a complete end to end data platform. There is a lot of documentation, blogs, examples, books and web sites, covering Spark core and its component libraries, but there are scant resources with information on how to integrate these components to build a complete end to end solution.
The Lambda Architecture (http://lambda-architecture.net) (LA) enables developers to build large-scale, distributed data processing systems in a flexible and extensible manner, being fault-tolerant both against hardware failures and human mistakes. I will walk you through building a stream analytics engine using Spark and the Lambda architecture.
The talk will cover building all three layers using Spark, each coming with its own set of requirements: i. the batch layer, managing the master dataset (an immutable, append-only set of raw data) and pre-computing batch views, ii. the serving layer, indexing batch views so that they can be queried in a low-latency, ad-hoc way, and iii. the speed layer, dealing with recent data only, and compensating for the high latency of the batch layer. The talk would be accompanied by a real world example with code and a live demo.
This tutorial will be valuable for developers, architects, or project leads who are already knowledgeable about Spark and are now looking for more insight into how it can be leveraged to implement real-world applications.
Deenar Toraskar is a co-founder of Think Reactive (http://www.thinkreactive.co.uk), which provides a responsive, resilient, elastic, ready-to-go data analytics solution based on Spark. The solution is built using state-of-the art technology end-to-end; from ETL and data pipelines (both batch and streaming), persistence adapters, to analysis and algorithms. All components are packaged using Docker and can run on bare metal or any cloud. Previously Deenar worked at a Tier 1 Investment Bank. He led a team developing risk analytics applications (with numerous passionate and satisfied users) on a Spark/Hadoop platform. Deenar is also a Apache Spark committer.
We will, as always, also be heading to the Slaughtered Lamb (http://www.theslaughteredlambpub.com/) pub afterwards.
**IMPORTANT READ ME TO REGISTER **
Skills Matter are hosting this event and are handling the attendance it is essential that you confirm your place at this link:
https://skillsmatter.com/meetups/7114-maximize-your-spark-applying-the-lambda-architecture-with-spark-spark-streaming (https://skillsmatter.com/meetups/7114-maximize-your-spark-applying-the-lambda-architecture-with-spark-spark-streaming)
failure to do so may result in not obtaining a seat. Please register on the Meetup.com "I'm going" to only let the others in the group know your going.
If this is your first time to SkillsMatter, directions are: http://skillsmatter.com/go/find-us

Deenar Toraskar: Applying the Lambda Architecture with Spark/Spark Streaming