Intro to Spark Training Hosted by Staples


Details
Big thanks to Staples for hosting and sponsoring the meetup!
Agenda
12:00pm – 12:45pm: Food / Networking
12:45pm – 1:00pm: Intro
1:00pm – 5:00pm: Training
Topics include
• Overview of Spark Fundamentals & Architecture
• What’s new in Spark 2.3
• Unified APIs: SparkSessions, SQL, DataFrames, Datasets
• Introduction to DataFrames, Datasets and Spark SQL
• Introduction to Structured Streaming Concepts
• Four Hands-On Labs
You will use Databricks Community Edition (https://databricks.com/try), which will give you unlimited free access to a ~6 GB Spark 2.x local mode cluster. And in the process, you will learn how to create a cluster, navigate in Databricks, explore a couple of datasets, perform transformations and ETL, save your data as tables and parquet files, read from these sources, and analyze datasets using DataFrames/Datasets API and Spark SQL.
Level: Beginner to intermediate, not for advanced Spark users.
Prerequisite: You will need a laptop with Chrome or Firefox browser installed with at least 8 GB RAM. Basic knowledge Scala or Python is required, since the Notebooks will be in Scala; Python is optional. Please note that laptops will not be provided so you must bring your own.
Trainer Bio:
Joseph Kambourakis is a data science instructor at Databricks. Joseph has more than 10 years of experience teaching, over five of them with data science and analytics. Previously, Joseph was an instructor at Cloudera and a technical sales engineer at IBM. He has taught in over a dozen countries around the world and been featured on Japanese television and in Saudi newspapers. He is a rabid Arsenal FC supporter and competitive Magic: The Gathering player. Joseph holds a BS in electrical and computer engineering from Worcester Polytechnic Institute and an MBA with a focus in analytics from Bentley University. He lives with his wife and daughter in Needham, MA.


Intro to Spark Training Hosted by Staples