NY Water Complaint Analysis using Apache Spark


Details
In New York City, Water Complaints are among the top 311 complaints during the summer.
Using Water complaints in New York City as an example, in this meet up we shall demonstrate how Spark, Zeppelin, R, Spark SQL, Spark ML Lib, and IBM System ML can be used on publicly available data sources, like 311 and Housing data source from Pluto, to form a seamless data science pipeline and models.
We'll show how individual data sources can be explored, curated/prepared, merged and then machine learning models can be developed on the resultant data sets. We shall use Zeppelin notebook as the tool for step by step interactive Data Exploration, Data Preparation and Data Modeling activities using Spark as the back end cluster computing framework.
We'll showcase how data sets can be created and shared across Spark SQL, Spark MLLib, Spark R, R and IBM System ML through Zeppelin. Also we shall show how visualization libraries in R can be used from Zeppelin on the source and predicted data for the interactive visualization.
Light refreshments will be served.

Sponsors
NY Water Complaint Analysis using Apache Spark