Data-Intensive Applications with Hadoop and Spark


Details
This workshop is all about data. Reading and collecting data from sensors, analysing log files, running analytical reports. We will discuss our experience with data-related applications: what challenges and problems are there? Which tech and tools can we use, and why?
The objective is to get you up and running with Hadoop and Spark, help you to hack on some hands-on exercises and let you experiment with your favourite data sets.
Hopefully, you will learn something new as well: what distributed file systems are, what map reduce is, data streaming and data pipelines.
Please read our attendance policy (https://www.meetup.com/Women-Who-Code-Sydney/pages/Attendance_Policy/) before you RSVP.
Agenda:
6:00pm - Arrive and dinner
6:30pm - Introduction talk by Svetlana Filimonova
7:00pm - Installing & hacking
9:00pm - Home time
Knowledge prerequisites:
Basic knowledge of one of: java, scala, python or sql
Software prerequisites:
Please bring your laptop along to the workshop with the following installed:
-
You should have git (If Windows: https://msysgit.github.io/ )
-
For both Windows and Macs:
2.1. Download Spark (Pre-build for Hadoop 2.6) http://spark.apache.org/downloads.html
2.2. Try to run spark-1.4.1-bin-hadoop2.6/bin/spark-shell
(You may need to have JDK installed with java and javac available at your path)
2.3. Once it is loaded you will see scala console:scala>
Type 'sc' and press enter, you should see:res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@1f60824e
If everything works ok - its done, you have your local spark with hadoop installed correctly.If not - don't worry, come to the workshop and we will help you =)
- Your favourite IDE for editing code

Data-Intensive Applications with Hadoop and Spark