Technical workshop on Spark SQL


Details
Learn the basics of Spark SQL, the most popular component of Apache Spark. Briefing and hands-on training on topics including: A brief overview of Spark, Introduction to the Dataframe API and extraction of data using SQL, Additional Dataframe functions and reading data from different sources.
NOTE: Attendees need to bring their own laptop for the exercises
Agenda
Coffee (15 mins)
Welcome – Women in Big Data, SAP Human Resources (15 min.)
Spark Overview (15 min.)
Spark SQL and the DataFrame API (20 min. Lecture, 10 min set up and 20 min. Exercise)
Break (10 min.)
Additional DataFrame functions (10 min. Lecture and 10 min. Exercise)
Spark SQL with different Data sources (15 min. Lecture, 15 min. Exercise)
Questions (15 min.)
Lunch (30+ mins)
Pre-requisite
Basic familiarity with Spark and SQL syntax
Basic Familiarity with Scala or Java
Sign up to Databricks community notebook which will be used for hands on training (https://accounts.cloud.databricks.com/registration.html#signup/community)
Instructor bios
Xinh Huynh is a senior software engineer, with over ten years experience developing analytics and data pipelines at scale. She most recently worked in the analytics team at Samsung SDS America, applying Spark and Scala for data munging, exploration, and data pipelines.
Gayathri is a software engineer at Intel with years of experience in Application development, technical consulting, developer advocacy and performance tuning. She is an active contributor to the open source Apache Spark project especially to the Machine Learning Library.

Technical workshop on Spark SQL