Apache Spark Hands-on Workshop

Details
Overview:
After a successful meetup in February where one of the presentations was an introduction to Spark, Hadoop-DC received a lot of feedback from its members and the consensus was: you wanted to learn more about Spark. So we listened! Hadoop-DC formed a joint meetup with our friends at Apache Spark Maryland to bring you a hands-on workshop and deep dive into Spark. There will be a mix of brief lectures and demos followed by hands-on technical exercises in Scala and Spark. The goal of this workshop is to gain a basic hands-on introduction to Spark while learning functional programming techniques. This workshop was developed by Tetra Concepts, and is sponsored by BAE Systems and Booz Allen. We will not provide food for this meetup, but please feel free to bring your own!
Meetup Agenda:
5:00 - 5:30 - Networking
5:30 - 5:45 - Introductions
5:45 - 8:00 - Interactive Spark lecture and exercises
8:00 - 8:30 - Challenge exercise
Complimentary Parking:
Complimentary valet parking will be provided for all guests. Simply drive to the entrance of the hotel and let the attendants know that you're there for the Apache Spark meetup.
Audience and Pre-requisites:
This workshop is intended for software developers who have a background developing in Java, Python, or Scala with familiarity in the MapReduce paradigm. No experience with Apache Spark is required. The brief lectures will introduce Scala and enough to learn and use the Spark Shell. The case studies and hands-on exercises will focus on using Spark to accelerate the traditional MapReduce design and build cycle.

Course Outline:
• Installing Spark locally
• Basic theory of the Resilient Distributed Dataset
• Data exploration with Spark at the Spark Shell
• Using Spark's core APIs in Scala
• Using Spark's PairRDD functions
• Deploying a job on a Spark cluster
• How to access logs and diagnose a running job Instructor
IMPORTANT: Prior to the meetup, developers should install the following on their laptop to participate in the hands-on portion of the workshop:
- Developers can unpack a TAR file directly on their host system. The only prerequisite is to have Java 1.7 installed. Check back soon for a link to the TAR file.
- Alternatively, developers can download a pre-configured Vagrant VM
here: https://github.com/tetra-concepts-llc/spark-training-vm Note you must have the Vagrant software installed prior to using the virtual machine. https://www.vagrantup.com/ - As this will be a hands on workshop, we need to make sure that we don't overbook the event. Registration will be capped for this meetup, so please signup if you can absolutely attend!
To download the tar file, paste the link below into your web browser. Next, click the download icon (downward pointing arrow) at the very top and middle of the web page. Finally, extract the contents of the gziped tar file onto the host system and follow the directions in the README.md file.
https://drive.google.com/file/d/0B54qWs-0SiLNUXBCaENRX1JZUUk/view?usp=sharing
About Our Presenter:
The lectures and problem sets will be presented by Dr. JT Halbert, Tetra Concept's Chief Data Scientist. JT has over a decade of experience solving hard problems in various fields: orbital mechanics and control, nonlinear dynamics and Chaos theory, cloud computing, computer network defense. JT is passionate about helping people infer patterns, extract insight, and communicate these from the records of the observable world.

Apache Spark Hands-on Workshop