Decomposing the SMACK Stack Part One: Spark and Mesos

Name: Decomposing the SMACK Stack Part One: Spark and Mesos
Start: 2016-03-31T18:00:00+02:00
End: 2016-03-31T21:00:00+02:00
Location: Ooyala (Entrance A)

Hosted By

Sebastian S.

Details

Decomposing the SMACK Stack Part One: Spark and Mesos

SMACK (Spark, Mesos, Akka, Cassandra, Kafka) gains popularity nowadays providing engineers with more flexibility in design of data processing platform architectures allowing to pick only necessary parts to reach their goals. Each of the components of SMACK stack is a big and interesting system itself, so in first part of series of talks we're going to have a deeper look into Spark and Mesos of SMACK. Spark is fast and general purpose engine for distributed, large-scale data processing and Mesos is a cluster resource manager that provides resource isolation and sharing across distributed applications including Spark.

The first part of the talk contains general overview of SMACK stack and possible architecture layouts that could be implemented on top of it. We discuss Apache Spark internals: the concept of RDD, DAG logical view and dependencies types, execution workflow, shuffle process and core Spark components. Different Spark applications will be demoed in dockerized Hadoop environment providing examples for some analytics jobs, data movement between different storage systems (e.g. between Mongo, Cassandra and Parquet) and examples of different execution modes of Spark.

The second part is dedicated to Mesos architecture and the concept of framework, different ways of running applications and schedule Spark jobs on top of it. We'll take a look at popular frameworks like Marathon and Chronos and see how Spark Jobs and Docker containers are executed using them. Finally custom Mesos Framework implementation will be presented which will be executed on a virtual cluster.

Links to the repositories with dockerized environments and demo code will be provided after the session.

Speaker Bio

Anton Kirillov started his career as Java developer in about 2007, simultaneously working on a Ph.D. thesis in the Semantic Search domain. After finishing and defending his thesis he switched to the Scala ecosystem and distributed architectures development. In recent years Anton was working on data platform architectures and developed with Hadoop, Spark, Mesos, Akka and Cassandra. He is a big Scala fan and SMACK stack advocate currently working on challenging projects as a Staff Engineer in the Ooyala Data Team.

Agenda

17:30 The doors are open, meet

18:00 - 19:00 -> SMACK intro + Spark Internals

Pizza break

19:15 - 20:00 -> Mesos architecture + demos

Food, beer and snacks will be provided starting 17:45

Events in Stockholm, SE