Spark Workshop

Name: Spark Workshop
Start: 2016-06-03T15:00:00+03:00
End: 2016-06-03T19:00:00+03:00
Location: Tech Hub

Hosted By

Dan S.

Details

This 4-hour workshop introduces Apache Spark, the open-source cluster computing framework with in-memory processing that makes analytics applications up to 100 times faster compared to technologies in wide deployment today. Highly versatile in many environments, and with a strong foundation in functional programming, Spark is known for its ease of use in creating exploratory code that scales up to production-grade quality relatively quickly (REPL driven development).

TechHub has graciously agreed to host us.

The plan is to start with a few publicly available datasets and gradually work our way through them until we harness some useful insights, gaining a deep understanding of Spark’s rich collections API in the process.

We are also going to look at a very simple Spark Streaming example (stream of integers / rolling sum).
We'll first stream data via TCP socket (netcat), then via Kafka topic (Apache Kafka).

The workshop has some requirements.