Suite 301,3rd floor, Fremont, CA
Note : Pizza would be served for Lunch
Apache Spark 2.x has laid the foundation for many new features and functionality. Its main three themes—easier, faster, and smarter—are pervasive in its unified and simplified high-level APIs for Structured data.
In this introductory part lecture and part hands-on workshop you’ll learn how to apply some of these new APIs using Databricks Community Edition. In particular, we will cover the following areas:
What’s new in Spark 2.x
SparkSessions vs SparkContexts
Datasets/Dataframes and Spark SQL
Introduction to Structured Streaming concepts and APIs
You will use Databricks Community Edition, which will give you unlimited free access to a ~4 GB Spark 2.x local mode cluster. And in the process, you will learn how to create cluster, navigate in Databricks, explore couple of datasets, perform transformations and ETL, save your data as tables, read from tables, and analyse datasets using DataFrames/Datasets API and Spark SQL.
Level: Beginner to intermediate. This is *not* for advanced Spark users.
Prerequisite: You will need a laptop with Chrome or Firefox browser installed with at least 8 GB. Introductory or basic knowledge of Scala or Python is required, since the Notebooks will be in Scala; Python is optional.
Jules S. Damji is an Apache Spark Community Evangelist with Databricks. He is a hands-on developer with over 15 years of experience and has worked at leading companies, such as Sun Microsystems, Netscape, LoudCloud/Opsware, VeriSign, Scalix, and ProQuest, building large-scale distributed systems. Before joining Databricks, he was a Developer Advocate at Hortonworks.