Spark Workshop

Name: Spark Workshop
Start: 2016-06-03T15:00:00+03:00
End: 2016-06-03T19:00:00+03:00
Location: Tech Hub

Hosted by Dan S.

Meet the group

The Bucharest Agile Software Meetup Group

No reviews yet

Details

This 4-hour workshop introduces Apache Spark, the open-source cluster computing framework with in-memory processing that makes analytics applications up to 100 times faster compared to technologies in wide deployment today. Highly versatile in many environments, and with a strong foundation in functional programming, Spark is known for its ease of use in creating exploratory code that scales up to production-grade quality relatively quickly (REPL driven development).

TechHub has graciously agreed to host us.

The plan is to start with a few publicly available datasets and gradually work our way through them until we harness some useful insights, gaining a deep understanding of Spark’s rich collections API in the process.

We are also going to look at a very simple Spark Streaming example (stream of integers / rolling sum).
We'll first stream data via TCP socket (netcat), then via Kafka topic (Apache Kafka).

The workshop has some requirements.

Bring your own laptop.
Have Docker already installed before the workshop.
Have the Docker image already pulled and available locally.

Here are the necessary instructions:

Install Docker
Linux:
curl -fsSL https://get.docker.com/ | sh
Mac and Windows:
https://www.docker.com/products/docker-toolbox
docker pull dserban/sparkworkshop

The batch processing code will be in Python with iPython Notebook (Jupyter).

The Spark Streaming code will be in Scala.

Events in Bucharest, RO

Spark Workshop

The Bucharest Agile Software Meetup Group

Details

Members are also interested in