Toronto Apache Spark #5


Details
Title
-
Spark as a Service
-
Real-life use-case examples for Apache Spark Libraries
-
Group Discussion
Agenda
• 6:30PM to 6:45PM - Opening & Refreshment
• 7:00PM to 7:15PM - Spark as a service by Sansom Lee
• 7:15PM to 7:45PM - Talk #1 by Adastra Team
• 7:45PM to 8:15PM - Talk #2 by Adastra Team
• 8:15PM to 8:25PM - Break (Refreshment provided)
• 8:25PM to 9:20PM - Group Discussion (consulting & problem solving)
Speakers short bio
Sansom Lee is a data enthusiast. Together with a dynamic team at LoyaltyOne he is embarking on a mission to explore new Big Data technologies to revamp the current stack. Lately, he is a big fan and intrigued by the simple yet powerful paradigms of Spark, Akka and Kafka.
Frantisek Mantlik is a Big Data and Analytics consultant and Emerging Big Data Technologies technical competency lead at Adastra. Frantisek was involved in creation of the contents and most hands-on examples for the Advanced Spark Adastra Academy course.
Gordon Gibson is a dedicated analytics consultant experienced in performing statistical analysis and developing optimization models to provide business solutions and insightful quantitative analysis.
Sepideh Seifzadeh is Adastra Academy consultant. She builds training material for emerging technologies in Big Data and Analytics. Her recent work includes, creating training material for Scala programming language and Apache Spark framework,
Tri Nguyen is a Data Engineer certified in both Hadoop and Data Science. Too busy (and distracted) by many dissimilar tools for data processing and analysis, he ended up being attracted to Spark as a general distributed processing framework.
Target Audience
Data Scientists/Analysts, Data Engineers
**This event is aimed to provide value for beginners and not advanced users.
Description
Spark as a Service by Sansom Lee:
Introducing an open source component Spark Jobserver , developed by Ooyala, which provides a RESTful interface for submitting and managing Apache Spark jobs, jars, and job contexts. Given the capability of accessing job context, it can be use to build a RESTful API on top of any Spark application.
Spark Introductory talks by Adastra Team:
In the first part, we will start with a brief overview of Apache Spark built-in libraries Spark Streaming, Spark SQL, Spark MlLib and Spark GraphX followed by presentation of two real-life example applications. The Spark Streaming application demonstrates real-time streaming of Twitter firehose sample data and storing Twitter statuses in Json format in a Hive metastore. The Spark SQL sample application creates a simple profile of structured or semi-structured data stored in a Hive table.
In the second part of the meetup, an open discussion group will be created for each of the four Spark libraries. The discussions will be moderated by creators of the corresponding chapters of the Advanced Spark course at Adastra Academy.

Toronto Apache Spark #5