Analyzing Time Series Data with an ARIMA model

Future of Data: New Jersey - Princeton, Edison, Holmdel
Future of Data: New Jersey - Princeton, Edison, Holmdel
Public group

Online event

This event has passed


6-6:15 pm - Introduction to CSA/Apache Flink and FLaNK Demo by Principal DataFlow Field Engineer Tim.

Building Edge-to-AI Applications for Hybrid Cloud with CDP

The Main Event - DataScience and Machine Learning on Time Series IoT Data - Analyzing Time Series Data with an ARIMA model

6:30 pm - Data Scientist Victor Dibia will talk about how to build a Time Series model for sensor data. Read on of his awesome articles And build, test, experiment and deploy your model and a visual application with Cloudera Machine Learning platform in AWS.

I will feed him sensor data with MiNiFi Agents from sensor devices, Edge Flow Manager, NiFi, Kafka and Flink.

Our Tri-State Meetup Data Team: Amol Thacker, Paul Vidal and John Kuchmek will be hosting and providing color commentary.

If you collect your data, then you will find it ...Time After Time (spin of Cyndi Lauper song)
Collect data at the edge and analyze with an ARIMA model. (matter of fact tile)

The Internet of Things (IoT) is growing in popularity but it isn’t new. Connected devices have existed in manufacturing and utilities with Supervisory Control and Data Acquisition (SCADA) systems. Time series data has been looked at for sometime in these industries as well as the stock market. Time series analysis can bring valuable insight to businesses and individuals with smart homes. There are many parts and components to be able to collect data at the edge, store in a central location for initial analysis, model build, train and eventually deploy. Time series forecasting is one of the more challenging problems to solve in data science. Important factors in time series analysis and forecasting are seasonality, stationary nature of data and autocorrelation of target variables. We show you a platform, built on open source technology, that has this potential. Sensor data will be collected at the edge, off a Raspberry Pi, using Cloudera’s Edge Flow Manager (powered by MiNiFi). The data will then be pushed to a cluster containing Cloudera Flow Manager (powered by NiFi) so it can be manipulated, routed, and then be stored in Kudu on Cloudera’s Data Platform. Initial inspection can be done in Hue using Impala. The time series data will be analyzed with potential forecasting using an ARIMA model in CML (Cloudera Machine Learning). Time series analysis and forecasting can be applied to but not limited to stock market analysis, forecasting electricity loads, inventory studies, weather conditions, census analysis and sales forecasting.