Spark Streaming and the Internet of Things


Details
Spark Streaming and the Internet of Things
The Internet of Things (IoT) is connecting billions of devices. This talk shares how to use the Apache Spark ecosystem to harness IoT data to build practical solutions: Spark Streaming for near-real-time ingestion and complex event processing, SparkSQL for analytics and feature extraction for machine learning. All tied to results that save companies millions of dollars by preventing theft, identifying waste and using predictive analytics to improve efficiency.
We’ll share lessons learned from using Spark, Kafka, Cassandra, Hadoop and Parquet in 24x7, always-on environments for over two years, to process data from billions of dollars of “things in motion”. Examples include: managing ingestion of highly heterogeneous data flows, real-time cleansing of streaming data sets, solving the “small files” problem or streaming analytics, and true savings obtained combining technologies like Apache Parquet with SparkSQL.
Bios:
Jim Haughwout
Chief Architect and VP of Software at Savi Technology
Jim’s passion is using data to enable people to discover new items of interest, do things they have never been able to do before and overcome unsolvable problems. He has several open and classified patents in real-time analytics and has briefed regulators on the use of real-time data to produce beneficial analytic outcomes. For the last decade, he has been leading technology and software at various startups in the Boston, London and US Southeast ecosystems. Before that he led Architecture & Core Systems at AOL and Enterprise Technology Programs at Amgen. He is a graduate of MIT and Harvard University.
Anderson Osagie
Sr. Big Data Engineer/Architect and MetiStream Spark Trainer
Anderson has a successful track record of implementing Big Data projects and has specifically deployed many Spark solutions into production. He is a streaming expert who understands the end-to-end data pipeline and the nuances of batch and streaming integration. He has written and administered Spark jobs processing over ~10TB/day of streaming security data. His expertise includes Kafka, Spark and other Hadoop and NoSQL technologies. As a MetiStream Spark instructor, Anderson has trained hundreds of students on advanced topics of Spark. He is a Databricks Certified Spark Developer, holds a Masters in Computer and Information Technology from University of Pennsylvania and B.S. in Information Systems from University of Maryland and B.S. in Aerospace Engineering from Pennsylvania State University. Anderson started coding at a young age and is fluent in many languages including Scala, Java and Python.

Spark Streaming and the Internet of Things