Past Meetup

Let's Break Apache Spark - Workshop

This Meetup is past

96 people went

Location image of event venue

Details

NOTE: Barka Alrina being a cosy place offers around 30 table-seats, so first come first-table served :)

Dear DataKRKers, we're hosting a workshop this time. Bring your laptop to benefit most, but it's also ok to come bare-handed and watch.

Abstract:

Have you ever been annoyed at a new toy being broken by your 3-year old? I have. But guess what - every time I pick up some new technology I start behaving like a 3-year-old myself. I want to see what's going to happen if I press here or bend it there. Or indeed - how much the thing can be squeezed before it cracks or worse! Call it vandalism if you will, but it can be a very instructive process. By succumbing to this urge you start 'feeling' the different components of the technology. What it can do, what it's limits are. There's a term coined for this sort of technology 'feeling' - mechanical sympathy. Look it up it makes for a great reading.

Unlike the 3-year-old's toys that you need to throw away once broken, we're dealing with software here, so with tools like Docker under your belt the crash is, well, soft and easy to fix.

So let's inflict some pain on Apache Spark and see how and where it cracks.

After attending the workshop you'll be able to:
- deploy Apache Spark to run locally on your laptop using Docker
- run a simple distributed Python spark app in Jupyter Notebook
- see how it's all connected using Spark UIs
- follow the cycle: break it, feel it broken, fix it

Requirements:

- NOTE: it's ok if you come barehanded - you'll be able to just follow it all on screen
- have a laptop with:
- installed Docker Community Edition (https://docs.docker.com/engine/installation/)
- installed git (https://gist.github.com/derhuerst/1b15ff4652a867391f03)
- run the following commands to download the necessary software for offline use
> docker pull dimajix/jupyter-spark
> git clone https://github.com/dimajix/docker-jupyter-spark.git

Bio:

Grzegorz is the Head of Data Science at VirtusLab where he's focusing on putting data to good use.

He's mostly interested in data, algorithms, software engineering, distributed systems, machine learning and quantitative finance.

Please note that the event takes place at Barka Alrina, next to Kładka Bernatka. It is being sponsored by our Big Data friends from VirtusLab. Thank you for your support!