Apache Spark Fundamentals


Details
This session aims to be useful for the beginners by going through Apache Spark fundamentals topics such as Ecosystem, Operation Types (Transformation & Actions), Spark Data Structures, Persistency etc.
Eren will be going through sample transformation operations on Databricks Community Edition by using Scala RDD, DataFrame and DataSet APIs.
Then Eamon will share his experience of Apache Spark with Databricks Community Edition and he shows how to apply Logistic Regression using Apache Spark with Python.
Pre-requisites: Basic programming experience - you do not need to be a Python/Scala expert.
Agenda:
9:30 - 9:45 Installations, Q&A
9:45 - 10:45 Introduction to Apache Spark with examples of RDD, DataFrames and DataSet APIs - Eren
10:45 - 11:00 Coffee break
11:00 - 12:00 Implementing Logistic Regression on Databricks Community Edition - Eamon
12:00 – 12:30 Apache Spark Open Forum
If you want to participate in the Labs, please sign up for the Databricks Community Edition using the link below. It is a free service.
https://accounts.cloud.databricks.com/registration.html#signup/community
Please bring along a fully charged laptop.
Looking forward to seeing you there!
Speakers:
Eren Avşaroğulları (https://www.meetup.com/Data-Science-and-Engineering-Club/members/195100595/) holds both B.Sc & M.Sc. degree in Electronics & Control Engineering. Currently, he works at Workday on Data Transformation/Cleaning as Sr. Software Engineer. His current focus is mostly Data Transformations / Cleaning and Distributed Computing Challenges.
Eamon Thornton (https://www.meetup.com/Data-Science-and-Engineering-Club/members/101258452/) is a qualified Electronic Engineer/Project Manager with a wealth of industry experience. He recently completed a Masters in Business Analytics and is currently lecturing in Project Management and Business Analytics. Lately he has been experimenting with Apache Spark on the Databricks Community Edition.

Apache Spark Fundamentals