Building a data pipeline that benefits the entire company

Name: Building a data pipeline that benefits the entire company
Start: 2018-09-11T18:00:00-05:00
End: 2018-09-11T20:00:00-05:00
Location: @4240

Hosted by DeAnna T.

St. Louis Machine Learning & Data Science

Details

Presenter: Dan Vatterott, Data Scientist - Showtime

Many analysts do not have the skills required to access customer-level data, leaving them unable to take advantage of this increasingly common data. Dan will describe a data-pipeline framework (using Spark) that enables all users throughout an entire company to access rich, customer-level data. Data Scientists, Data Analysts, Data Engineers, Product Managers, and anyone else interested in data distribution will benefit. Dan will follow his presentation with a quick Spark tutorial; introducing the technology that makes this pipeline possible.

Bring your laptop if you want to follow along with the tutorial. If you'd like to follow along, install Docker CE (If you have an older machine you might want to use Docker Toolbox - https://docs.docker.com/toolbox/toolbox_install_windows/). After installing Docker, open your terminal/Docker Quickstart Terminal and run "docker run --rm -p 8888:8888 -p 4040:4040 dvatterott/pyspark_iris:intial_commit". This will download and run the Docker image for this tutorial. The entire process takes about 35 minutes.

6:00 - 6:30 Networking | Snacks
6:30 - 7:15 Presentation
7:15 - 7:45 Spark tutorial | Q&A

Events in Saint Louis, MO

Building a data pipeline that benefits the entire company

St. Louis Machine Learning & Data Science

Details

Members are also interested in