Learn Spark for Big Data - 4 Week Night Class (PAID)
Details
*NOTE: Eventbrite check in is required. Register here. (https://www.eventbrite.com/e/learn-spark-for-big-data-san-francisco-712-tickets-26138237171)
Take your big data engineering skills, and your salary (http://www.datanami.com/2015/11/04/skip-the-ph-d-and-learn-spark-data-science-salary-survey-says/), to the next level with Spark. In this class, you’ll learn how to batch process data, build data pipelines and process data in near real time.
Why Spark?
Originally created at University of Berkeley, Spark is a powerful, open source processing engine for data distributed across large clusters. Spark is optimized for speed and ease of use; it uses caching and memory to run distributed algorithms 100x faster than MapReduce. Spark can be used for batch process and for processing data in near real-time.
What You’ll Learn:
In this four week hands-on Spark training, learn how to:
• Use Spark to solve real-world problems and use-cases
• Process terabytes of data using Spark
• Build real-time big data applications using Spark Streaming
• Optimize Spark applications
• Audience & Prerequisites
This workshop series is for developers, data engineers, data scientists, data analysts, architects, IT/operations, technical managers and anyone else who wants to master Spark to analyze data at scale.
Programming:
Course examples and exercises are presented in Python and Scala, so knowledge of one of these programming languages is required.
Command Line & Version Control:
Basic knowledge of Unix commands (i.e. command line) is required.
We will use GitHub for sharing and maintaining code. Before class, you should create a GitHub account and be familiar with: cloning and forking repositories, pull requests, branches and making commits.
Weekly Agenda
Tuesdays & Thursdays, 6pm – 9pm
Meet Your Instructor Asim Jalis (https://www.linkedin.com/in/asimjalis), Galvanize Data Engineering Instructor
Asim is the Lead Instructor in the Data Engineering program at Galvanize. Before joining the Galvanize team, Asim worked as a Senior Technical Instructor at Cloudera where he taught Cloudera developer courses on Hadoop and Spark. He has also worked at Microsoft, Salesforce, and HP. Asim has an MS in Computer Science from the University of Virginia, and an MA in Mathematics from the University of Wisconsin–Madison.
Full Course Outline
Week 1: Intro to Spark (2 evenings) Class 1: Transformations/Actions, Pair RDDs, ReduceByKey, GroupByKey, Joins, Partitions
Class 2: Narrow and wide transformations and stages, caching and persistence, checkpointing
Week 2: Spark SQL (2 evenings) Class 1: Data Frames, Data Formats: JSON, CSV, Avro, Parquet, Compression Class 2: Caching, Select and Filter, User Defined Functions, AWS and S3
Week 3: Spark Streaming and Real-time (2 evenings) Class 1: Micro-Batches and DStreams, Transformations and Output Operations, Windowing operations Class 2: State DStream, Checkpointing and Fault Tolerance, Deployment and Monitoring
Week 4: Spark Advanced (tuning for performance) schedule for 2 evenings per week Class 1: Map-Side Joins, Closures, Broadcast Variables, Accumulators Class 2: Optimizing Joins, Data Skew, Partitioning, Coalescing, Metrics Using Application UI
Setup
• Bring your laptop and power cable
• Install JDK8 from Oracle http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
• Install IntelliJ Community Edition: https://www.jetbrains.com/idea/download (https://www.jetbrains.com/idea/download/)
• Download Apache Spark 1.6.1 from http://spark.apache.org/downloads.html (choose the package type Pre-built for Hadoop 2.6 and later)
• We will assist you with installing Spark on the first day
*Course completion will empower you to use Spark on projects but does not guarantee a job in big data engineering.
*Registering via Eventbrite is required: https://www.eventbrite.com/e/learn-spark-for-big-data-san-francisco-712-tickets-26138237171

Canceled
Learn Spark for Big Data - 4 Week Night Class (PAID)