Spark Hands On Workshop


Details
Link To Follow-along
https://github.com/jt-halbert/spark-workshop/blob/master/followalong-20151206.scala
Overview
We have received many requests for another hands-on Spark Workshop and we listened! Please bring your laptop for an insightful evening of learning Spark development from Tetra Concepts’ Chief Data Scientist Dr. JT Halbert. There will be a mix of brief lectures and demos followed by hands-on technical exercises in Scala and Spark. The goal of this workshop will be to take a tour of Spark 1.5.1 with particular attention to Dataframes in the beginning. There will be group exercises (some very challenging) and we will cover a few advanced topics like functional state machines in Scala, Markov Models, Gibbs Sampling. We know you will find something interesting and we hope you will be entertained! This workshop was developed and sponsored by Tetra Concepts (http://www.tetraconcepts.com/) a leading provider of data science and analytic development.
Meetup Agenda:
5:00 – 6:00 – Live DJ, Networking, Happy Hour, Pizza
6:00 – 8:00 - Interactive Spark lecture and exercises
8:00 - 8:30 - Challenge exercise
IMPORTANT: Prior to the meetup, developers should install either the Docker image or Vagrant VM on their laptop to participate in the hands-on portion of the workshop:
To use Docker see here for more Information: https://github.com/tetra-concepts-llc/spark-training-vm/tree/master
To use the Vagrant VM see here: https://github.com/tetra-concepts-llc/spark-training-vm/tree/vagrant
Note you must have the Vagrant software installed prior to using the virtual machine. https://www.vagrantup.com/
Audience and Pre-requisites:
This workshop is intended for software developers and Data Scientists who have a background developing in Java, Python, or Scala with familiarity in the MapReduce paradigm. No experience with Apache Spark is required. The brief lectures will introduce Scala and enough to learn and use the Spark Shell. The case studies and hands-on exercises will focus on using Spark to explore and model data.
Course Outline:
• Basic Functional Programming with Scala
• Basic theory of the Resilient Distributed Dataset
• Data exploration and Data modeling with Dataframes in the spark shell
• Using Spark's core APIs in Scala
• Build a Markov Model of a text corpus
• Built Markov state machine simulation in Scala
About Our Presenter:
The lectures and problem sets will be presented by Dr. JT Halbert, Tetra Concepts (http://www.tetraconcepts.com)' Chief Data Scientist. JT has over a decade of experience solving hard problems in various fields: orbital mechanics and control, nonlinear dynamics and Chaos theory, cloud computing, computer network defense. JT is passionate about helping people infer patterns, extract insight, and communicate these from the records of the observable world.

Spark Hands On Workshop