Decomposing the SMACK Stack Part One: Spark and Mesos


Details
Decomposing the SMACK Stack Part One: Spark and Mesos
SMACK (Spark, Mesos, Akka, Cassandra, Kafka) gains popularity nowadays providing engineers with more flexibility in design of data processing platform architectures allowing to pick only necessary parts to reach their goals. Each of the components of SMACK stack is a big and interesting system itself, so in first part of series of talks we're going to have a deeper look into Spark and Mesos of SMACK. Spark is fast and general purpose engine for distributed, large-scale data processing and Mesos is a cluster resource manager that provides resource isolation and sharing across distributed applications including Spark.
The first part of the talk contains general overview of SMACK stack and possible architecture layouts that could be implemented on top of it. We discuss Apache Spark internals: the concept of RDD, DAG logical view and dependencies types, execution workflow, shuffle process and core Spark components. Different Spark applications will be demoed in dockerized Hadoop environment providing examples for some analytics jobs, data movement between different storage systems (e.g. between Mongo, Cassandra and Parquet) and examples of different execution modes of Spark.
The second part is dedicated to Mesos architecture and the concept of framework, different ways of running applications and schedule Spark jobs on top of it. We'll take a look at popular frameworks like Marathon and Chronos and see how Spark Jobs and Docker containers are executed using them. Finally custom Mesos Framework implementation will be presented which will be executed on a virtual cluster.
Links to the repositories with dockerized environments and demo code will be provided after the session.
Speaker Bio
Anton Kirillov started his career as Java developer in about 2007, simultaneously working on a Ph.D. thesis in the Semantic Search domain. After finishing and defending his thesis he switched to the Scala ecosystem and distributed architectures development. In recent years Anton was working on data platform architectures and developed with Hadoop, Spark, Mesos, Akka and Cassandra. He is a big Scala fan and SMACK stack advocate currently working on challenging projects as a Staff Engineer in the Ooyala Data Team.
Agenda
17:30 The doors are open, meet
18:00 - 19:00 -> SMACK intro + Spark Internals
Pizza break
19:15 - 20:00 -> Mesos architecture + demos
Food, beer and snacks will be provided starting 17:45

Decomposing the SMACK Stack Part One: Spark and Mesos