Building a data pipeline that benefits the entire company


Details
Presenter: Dan Vatterott, Data Scientist - Showtime
Many analysts do not have the skills required to access customer-level data, leaving them unable to take advantage of this increasingly common data. Dan will describe a data-pipeline framework (using Spark) that enables all users throughout an entire company to access rich, customer-level data. Data Scientists, Data Analysts, Data Engineers, Product Managers, and anyone else interested in data distribution will benefit. Dan will follow his presentation with a quick Spark tutorial; introducing the technology that makes this pipeline possible.
Bring your laptop if you want to follow along with the tutorial. If you'd like to follow along, install Docker CE (If you have an older machine you might want to use Docker Toolbox - https://docs.docker.com/toolbox/toolbox_install_windows/). After installing Docker, open your terminal/Docker Quickstart Terminal and run "docker run --rm -p 8888:8888 -p 4040:4040 dvatterott/pyspark_iris:intial_commit". This will download and run the Docker image for this tutorial. The entire process takes about 35 minutes.
6:00 - 6:30 Networking | Snacks
6:30 - 7:15 Presentation
7:15 - 7:45 Spark tutorial | Q&A

Building a data pipeline that benefits the entire company