Data Engineering for Data Scientists


Details
Please join PyData Pittsburgh for the presentation Data Engineering for Data Scientists by Pete Fein!
In this fast-paced talk, you’ll learn how adopting data engineering best practices and tools can improve your data science projects and empower you to deliver better, more reliable results in record time. We’ll discuss data architecture and design principles and explore open source tools you can use today, including:
- Running Jupyter notebooks in production using Papermill and nbdev
- Improve data quality with Great Expectations and monitor models with Evidently.ai
- Write unit tests for your pandas and Spark DataFrames with pandera
- Reusable SQL with dbt, an exciting new tool for data transformation that’s transforming data teams
- Workflow orchestration with Apache Airflow, a better approach than fragile and frustrating cron jobs or Lambdas
- Version control your data alongside your code with DVC
Special thanks to Code & Supply for hosting us!
Attendees are welcome to use the parking lot associated with the building off St Clair Street. The front door on Friendship Avenue will be open but is stairs-only. There's an elevator by the parking lot entrance. Head to the third floor and look for signs pointing to the presentation room, where the event will be held. All doors should be unlocked and open, so you're welcome to come right in!

Sponsors
Data Engineering for Data Scientists