Skip to content

Apache Spark Saturday Hands-on Workshop and Lecture

Photo of Pitt Fagan
Hosted By
Pitt F.
Apache Spark Saturday Hands-on Workshop and Lecture

Details

Greetings everyone!

I am pleased to announce the following special event set for Saturday, November 11. Apache® Spark™ is one of the most popular big data and analytics platforms in use today, used for both data engineering and data science tasks. The instructor is flying in from California so I hope to see everyone there for an interesting and enlightening day!

Everything for the day, including content, food and drinks throughout the day, will be provided free of charge by Databricks. Please do not miss out.

Cheers,

Pitt

PS - Special thanks to meetup member Cary Walker for securing the venue.

Jump Start with Apache® Spark™ 2.2 on Databricks

Apache Spark 2.0 and subsequent releases of Spark 2.1 and 2.2 have laid the foundation for many new features and functionality. Its main three themes—easier, faster, and smarter—are pervasive in its unified and simplified high-level APIs for Structured data.

In this introductory part lecture and part hands-on workshop, you’ll learn how to apply some of these new APIs using Databricks Community Edition. In particular, we will cover the following areas:

Agenda:

• Overview of Spark Fundamentals & Architecture

• What’s new in Spark 2.x

• Unified APIs: SparkSessions, SQL, DataFrames, Datasets

• Introduction to DataFrames, Datasets and Spark SQL

• Introduction to Structured Streaming Concepts

• Four Hands-On Labs

You will use Databricks Community Edition (https://databricks.com/try), which will give you unlimited free access to a ~6 GB Spark 2.x local mode cluster. And in the process, you will learn how to create a cluster, navigate in Databricks, explore a couple of datasets, perform transformations and ETL, save your data as tables and parquet files, read from these sources, and analyze datasets using DataFrames/Datasets API and Spark SQL.

Level: Beginner to intermediate, not for advanced Spark users.

Prerequisite: You will need a laptop with Chrome or Firefox browser installed with at least 8 GB RAM. Basic knowledge Scala or Python is required, since the Notebooks will be in Scala; Python is optional. Please note that laptops will not be provided so you must bring your own.

Bio:

Jules S. Damji is an Apache Spark Community Evangelist with Databricks. He is a hands-on developer with over 15 years of experience and has worked at leading companies such as Sun Microsystems, Netscape, LoudCloud/Opsware, VeriSign, Scalix, and ProQuest, building large-scale distributed systems. Before joining Databricks, he was a Developer Advocate at Hortonworks.

Pre-class Instructions:

Follow instructions on this GitHub page (https://github.com/dmatrix/spark-saturday#instructions-to-register-for-free-databricks-community-edition)how to register for Databricks Community Edition (https://databricks.com/try) and how to import labs that we will cover for the workshop.

This workshop, including food and drinks, is sponsored by Databricks.

Photo of Big Data Madison group
Big Data Madison
See more events