Skip to content
This event was canceled

Learn Spark for Big Data - 4 Week Night Class (PAID)

Details

*NOTE: Eventbrite check in is required. Register here. (https://www.eventbrite.com/e/learn-spark-for-big-data-san-francisco-712-tickets-26138237171)

Take your big data engineering skills, and your salary (http://www.datanami.com/2015/11/04/skip-the-ph-d-and-learn-spark-data-science-salary-survey-says/), to the next level with Spark. In this class, you’ll learn how to batch process data, build data pipelines and process data in near real time.

Why Spark?
Originally created at University of Berkeley, Spark is a powerful, open source processing engine for data distributed across large clusters. Spark is optimized for speed and ease of use; it uses caching and memory to run distributed algorithms 100x faster than MapReduce. Spark can be used for batch process and for processing data in near real-time.

What You’ll Learn:
In this four week hands-on Spark training, learn how to:

• Use Spark to solve real-world problems and use-cases
• Process terabytes of data using Spark
• Build real-time big data applications using Spark Streaming
• Optimize Spark applications
• Audience & Prerequisites

This workshop series is for developers, data engineers, data scientists, data analysts, architects, IT/operations, technical managers and anyone else who wants to master Spark to analyze data at scale.

Programming:
Course examples and exercises are presented in Python and Scala, so knowledge of one of these programming languages is required.

Command Line & Version Control:
Basic knowledge of Unix commands (i.e. command line) is required.

We will use GitHub for sharing and maintaining code. Before class, you should create a GitHub account and be familiar with: cloning and forking repositories, pull requests, branches and making commits.

Weekly Agenda
Tuesdays & Thursdays, 6pm – 9pm

Meet Your Instructor Asim Jalis (https://www.linkedin.com/in/asimjalis), Galvanize Data Engineering Instructor
Asim is the Lead Instructor in the Data Engineering program at Galvanize. Before joining the Galvanize team, Asim worked as a Senior Technical Instructor at Cloudera where he taught Cloudera developer courses on Hadoop and Spark. He has also worked at Microsoft, Salesforce, and HP. Asim has an MS in Computer Science from the University of Virginia, and an MA in Mathematics from the University of Wisconsin–Madison.

​Full Course Outline

Week 1: Intro to Spark (2 evenings) Class 1: Transformations/Actions, Pair RDDs, ReduceByKey, GroupByKey, Joins, Partitions
Class 2: Narrow and wide transformations and stages, caching and persistence, checkpointing

Week 2: Spark SQL (2 evenings) Class 1: Data Frames, Data Formats: JSON, CSV, Avro, Parquet, Compression Class 2: Caching, Select and Filter, User Defined Functions, AWS and S3

Week 3: Spark Streaming and Real-time (2 evenings) Class 1: Micro-Batches and DStreams, Transformations and Output Operations, Windowing operations Class 2: State DStream, Checkpointing and Fault Tolerance, Deployment and Monitoring

Week 4: Spark Advanced ​(tuning for performance)​ ​schedule for 2 evenings per week​ Class 1: Map-Side Joins, Closures, Broadcast Variables, Accumulators Class 2: Optimizing Joins, Data Skew, Partitioning, Coalescing, Metrics Using Application UI

Setup
• Bring your laptop and power cable
• Install JDK8 from Oracle http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
• Install IntelliJ Community Edition: https://www.jetbrains.com/idea/download (https://www.jetbrains.com/idea/download/)
• Download Apache Spark 1.6.1 from http://spark.apache.org/downloads.html (choose the package type Pre-built for Hadoop 2.6 and later)
• We will assist you with installing Spark on the first day

*Course completion will empower you to use Spark on projects but does not guarantee a job in big data engineering.

*Registering via Eventbrite is required: https://www.eventbrite.com/e/learn-spark-for-big-data-san-francisco-712-tickets-26138237171

Photo of SF Data Science group
SF Data Science
See more events

Canceled

Galvanize
44 Tehama Street · San Francisco, CA