PySpark Workshop - Customer Analytics & PyData


Details
Customer Analytics Dublin and PyData Dublin invite you to this first join workshop on PySpark.
The goal of this workshop is to guide you on how to use PySpark to do customer analytics. Our target group are Spark & analytics beginners that want to get familiar with PySpark and data science. If you already are an avid Spark enthusiast and have data for breakfast everyday you might find this workshop too basic for you!
During this workshop, we will solve two tasks:
- Brief exploratory data analysis.
- Task 1: Characterise personas with ML-clustering Spark pipeline
- Task 2: Build our own machine learning pipeline to predict an output.
During the workshop (and after some mandatory pizza and drinks, nobody can code on an empty stomach), we will first describe each exercise and then we will let you solve them. In case you have any doubts or you get stuck, our colleagues will help you keep going and will give you some invaluable advice (don't hesitate to approach them!). After some time, we will discuss the results together and explore different solutions.
We will have a leaderboard on the second task with some prizes for the best submission!
Since we have a limited time to finish the workshop, please make sure that you:
Read the instructions carefully and follow all the steps before attending the meetup. We won't have time to install Python or Spark during the session so please make sure it is installed in your system (if it is not there yet) and it works correctly. We have added links and instructions on how to setup correctly for different OS below.
https://github.com/SergioGonzalezSanz/customer-analytics-pydata-workshop-march-2018
Get familiar with the tasks we will solve. It will solve some precious time during the workshop.
Download the workshop dataset and explore it before the session.
Don't forget to bring your computer!!! (try to charge it before the session, we might not have enough plugs for everyone).
Important information:
This is a free workshop and the team have put a lot of work on preparing it, the spots are limited so we will control the attendance. Please don't sign if you won't come and avoid cancelling your RSVP during the last day. Many people want to attend and we really appreciate your help.
And the most important of all: enjoy! We hope you have a good time and learn a little bit more about PySpark and Customer Analytics.

PySpark Workshop - Customer Analytics & PyData