Skip to content

Spark working with an IDE: Notebook/Shiny, and Resource Managers: Which is Best

Photo of Denny Lee
Hosted By
Denny L. and Felix C.
Spark working with an IDE: Notebook/Shiny, and Resource Managers: Which is Best

Details

(Please RSVP on meetup.com and also pre-register with Galvanize here (https://www.eventbrite.com/e/working-with-an-ide-pythonjupyter-notebooks-rstudio-spark-and-r-shiny-app-221-tickets-30469144030?aff=spark)) (https://www.eventbrite.com/e/working-with-an-ide-pythonjupyter-notebooks-rstudio-spark-and-r-shiny-app-221-tickets-30469144030?aff=spark))

We are having 2 sessions with folks from IBM which is also sponsoring food/drinks for this meetup!

Working with an IDE: Python/Jupyter Notebooks, RStudio, Spark and R Shiny App
Speaker: Thomas Liakos

We are demonstrating a Python driven - PySpark Notebook and (R) Shiny App which is created using DSX (Data Science Experience) This demo is based off of findings from this whitepaper (http://www.informs-sim.org/wsc11papers/082.pdf) that covers modeling and simulation of building energy performance for portfolios of public buildings.

The Problem: Energy inefficiency within public/private buildings in the City of New York.

The Goal: Take meter(Sensor) data, solve the inefficiencies through better insights.

The Solution: Visualization and Reporting through the Shiny App to gain knowledge in past, and present usage patterns. In addition to those patterns, compare and gain insights/predictions on energy usage.

Spark's Dataframes and RDD's will be used in concert with panda (library) to clean and model/prepare data for the R Shiny App. The message to convey in this meetup discussion is to show the capabilities of Spark while using DSX and RStudio/Shiny App to create visualization/reporting that will be able to give insights to the end user.

There are a few techniques that we will present in this notebook with both modeling and ML: Linear Regression, K-Means clustering for identifying inefficient buildings, (Statistical) Classification Modeling, followed by a confusion matrix (error matrices).

Thomas Liakos has been an Open Source Systems Engineer for 11 years and he has 8 years of experience in Cloud and hybrid environments. Prior to IBM Thomas was at Gem.co: Sr. Systems Architect. and CrowdStrike: DevOps / Systems Engineer - Cloud Operations. Thomas has expertise in Spark, Python, Systems and Configuration Management, Architecture, Data Warehousing, and Data Engineering.

Apache Spark Resource Managers - Which One is Best?Speaker: Whit Smith

Standalone, YARN, and Mesos are the currently available resource managers for Spark, but what is a resource manager, and how do these three options differ?

What might factor into your decision to use one resource manager vs. another? Come to this session for an overview of resource management in a Spark context, and considerations for choosing one resource manager over another for your use case.

Whit Smith IBM Certified IT Specialist

Agenda
18:30-18:45 Networking, Food & Drinks
18:45-18:50 Logistics
18:50-19:35 Working with an IDE: Python/Jupyter Notebooks, RStudio, Spark and R Shiny App
19:35-20:00 Resource Managers: Which One is Best?
20:00-20:15 Wrapping up

Thank you IBM and Galvanize for hosting this event!

Photo of Seattle Spark+AI Meetup group
Seattle Spark+AI Meetup
See more events
Galvanize.it
111 So. Jackson St. · Seattle, WA