Skip to content

NY Water Complaint Analysis using Apache Spark

Photo of Nancy Berlin
Hosted By
Nancy B. and IBM Big D.
NY Water Complaint Analysis using Apache Spark

Details

In New York City, Water Complaints are among the top 311 complaints during the summer.

Using Water complaints in New York City as an example, in this meet up we shall demonstrate how Spark, Zeppelin, R, Spark SQL, Spark ML Lib, and IBM System ML can be used on publicly available data sources, like 311 and Housing data source from Pluto, to form a seamless data science pipeline and models.

We'll show how individual data sources can be explored, curated/prepared, merged and then machine learning models can be developed on the resultant data sets. We shall use Zeppelin notebook as the tool for step by step interactive Data Exploration, Data Preparation and Data Modeling activities using Spark as the back end cluster computing framework.

We'll showcase how data sets can be created and shared across Spark SQL, Spark MLLib, Spark R, R and IBM System ML through Zeppelin. Also we shall show how visualization libraries in R can be used from Zeppelin on the source and predicted data for the interactive visualization.

Light refreshments will be served.

Photo of Data, Cloud and AI in NYC group
Data, Cloud and AI in NYC
See more events