Abiodun Akogun is a senior data scientist for the Enterprise Data Management and Analytics team at Christus Health. Prior to joining Christus Health, he was a senior consultant/data scientist at Verizon where he developed a space time model that accurately predicts utilization on telecom cell towers to within 5% accuracy. He is the head of Katy Data Analytics & Machine Learning (a meetup group with focus on use cases for machine learning and artificial intelligence). He likes using his analytical skills to teach math in his spare time and he also uses his spare time to give seminars on Big Data and Analytics. Some of his
projects involve building classification models, sentiment analysis on customer textual data to help identify patterns in customer data and building time series models to help forecast future trends in
organizational data. Abiodun holds a Master’s degree in Analytics from Texas A&M University, a Master’s degree in Electrical Engineering from Tennessee Technological University and a Bachelor’s degree (First Class Honors) in Electrical Engineering from Obafemi Awolowo University, Nigeria.
Apache Spark is a lightning-fast cluster computing technology, designed for fast computation. It is based on Hadoop MapReduce and it extends the MapReduce model to efficiently use it for more types of computations, which includes interactive queries and stream processing. The main feature of Spark is its in-memory cluster computing that increases the processing speed of an application.
Spark is designed to cover a wide range of workloads such as batch applications, iterative algorithms, interactive queries and streaming. Apart from supporting all these workloads in a respective system, it reduces the management burden of maintaining separate tools.
In this talk, we will explore some analytics features of Apache Spark. We will cover the components of spark and will work on a demo using spark to read in a dataset, perform some exploration on the data using Spark and explore some Spark SQL features.