Skip to content

Building a data pipeline that benefits the entire company

Photo of DeAnna Tipton
Hosted By
DeAnna T.
Building a data pipeline that benefits the entire company

Details

Presenter: Dan Vatterott, Data Scientist - Showtime

Many analysts do not have the skills required to access customer-level data, leaving them unable to take advantage of this increasingly common data. Dan will describe a data-pipeline framework (using Spark) that enables all users throughout an entire company to access rich, customer-level data. Data Scientists, Data Analysts, Data Engineers, Product Managers, and anyone else interested in data distribution will benefit. Dan will follow his presentation with a quick Spark tutorial; introducing the technology that makes this pipeline possible.

Bring your laptop if you want to follow along with the tutorial. If you'd like to follow along, install Docker CE (If you have an older machine you might want to use Docker Toolbox - https://docs.docker.com/toolbox/toolbox_install_windows/). After installing Docker, open your terminal/Docker Quickstart Terminal and run "docker run --rm -p 8888:8888 -p 4040:4040 dvatterott/pyspark_iris:intial_commit". This will download and run the Docker image for this tutorial. The entire process takes about 35 minutes.

6:00 - 6:30 Networking | Snacks
6:30 - 7:15 Presentation
7:15 - 7:45 Spark tutorial | Q&A

Photo of St. Louis Machine Learning & Data Science group
St. Louis Machine Learning & Data Science
See more events
@4240
4240 Duncan Ave. · Saint Louis, MO