Are you curious about Apache Spark? Come learn about it in this introductory class. We will cover:
Apache Spark Basics:
* What is Apache Spark?
* Starting the Spark Shell
* Using the Spark Shell
* Getting Started with Datasets and DataFrames
* DataFrame Operations
Working with DataFrames and Schemas:
* Creating DataFrames from Data Sources
* Saving DataFrames to Data Sources
* DataFrame Schemas
* Eager and Lazy Execution
This training assumes some familiarity with HDFS and YARN (though the discussion is not specific to Hadoop, this training assumes Hadoop usage). The instructor is happy to cover these topics if they are of general interest.
This training is structured as more of a show-and-tell, with opportunity to play with the tools after (bring your laptop if you'd like help with setup).
INSTRUCTOR: David Paschall-Zimbel is a Data Architect and Senior Systems Engineer at Collier IT. He has over 30 years of experience in the IT Industry, starting as a Research Fellow at the University of Minnesota.
David’s current areas of focus include Operating Environments, Engineered Systems, Clustering Technologies (including Hadoop) and Cloud Administration. He is an instructor for both Cloudera University and Oracle University courses.
David develops, maintains, and performs testing and evaluation of big data solutions within corporate big data solutions. David works with technologies like MapReduce, Hive MongoDB, and Cassandra. He has been involved in Proof of Concepts using technologies such as Spark, HDFS, Hive, Sqoop, and Flume.
In his free time, he enjoys traveling the world, reading, and playing Final Fantasy 14 MMO.
NOTE: Coffee and continental breakfast will be provided.