PyData Montreal #18: Data Engineering and Data Management


Details
Agenda:
(All times in EST)
6:00 pm — Introductions
6:10 pm — "The Python ETL: Airflow Vs. Luigi" by Ben Rogojan
6:55 pm — Q&A
7:10 pm — "Full Stack Deep Learning: Data Management" by Sergey Karayev
7:55 pm — Q&A
8:00 pm — Wrap-up
-----------------------------------------------------------------------------
The Python ETL: Airflow Vs. Luigi
Abstract:
Interview Query recently reported that the number of data engineering roles at companies continues to grow annually as surprisingly data scientists shrank back in comparison.
Data engineers spend much of their time building data pipelines and managing data warehouses.
There are lots of tools for data engineers to choose from when it comes to developing data pipelines.
In this talk, we will discuss two python libraries that are used for data pipelines. These are Airflow and Luigi. Over the past few years, these have become some of the more popular options for data engineers to utilize. Also, with the recent updates to Airflow, we would like to discuss and outline the differences between the two frameworks as well as some of the updates between Airflow and Airflow 2.0. We will both provide a high-level view of the differences and a few examples of how you may code the same process in these different libraries.
In the end, listeners should have a good understanding of the different terms used between the libraries as well as possibly leave with a personal favorite.
About Ben:
Ben has spent his career focused on all forms of data. He has focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. Also, Ben has worked with companies that range from Transportation, Finance, Saas to Start-ups.
He has helped develop end-to-end solutions that take clients' data from raw to machine learning models and dashboards.
Ben privately consults on data science and engineering problems both solo as well as with a company called Acheron Analytics. He has experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.
Full Stack Deep Learning: Data Management
Abstract:
Sergey shares an overview of managing data for machine learning, breaking the subject up into Sources, Storage, Processing, Exploration, Labeling, and Versioning, and sharing best-in-class tools for each task.
About Sergey:
Sergey Karayev heads AI for STEM at Turnitin. He co-founded Gradescope, an AI-assisted platform used to grade over a million students' exams and homework. He is also an instructor of Full Stack Deep Learning, a course taught online and at universities such as UW and UC Berkeley.

Sponsors
PyData Montreal #18: Data Engineering and Data Management