Skip to content

Data Engineering with Airflow, R and Postgres at Education Analytics

Photo of Pitt Fagan
Hosted By
Pitt F.
Data Engineering with Airflow, R and Postgres at Education Analytics

Details

Abstract:
Education Analytics (EA) partners with the CORE Districts—a consortium of eight school districts in California that serve more than 1 million students attending around 1,500 schools—to provide actionable metrics to district partners and stakeholders. To deliver timely data, our team at EA has built a data pipeline that uses the Python package Apache Airflow, the statistical programming language R, and PostgreSQL databases. We use Airflow to schedule runs of the system and to determine which new data to process, we use R to process data and calculate metrics, and we use PostgreSQL to store data in a custom longitudinal research data warehouse. This data feeds a custom, user-centered dashboard as well as other analytics and reports oriented around continuous improvement for the CORE districts. This data pipeline has become an integral part of the work that the CORE districts do in their improvement communities.

Some of the challenges we faced in building this system include (1) passing information between Python and R for logging, conditional execution, and error handling; (2) automating the processing of complex statistical methods like causal estimates of school effects on student outcomes and long term predictive models; and (3) designing robust quality control processes for automated systems. In this discussion, we share some lessons learned about the solutions we have arrived upon and preview some challenges we continue to work on solving.

Bio:
Jordan Mader is the Director of Analytics Engineering at Education Analytics. Jordan currently manages a team that specializes in building software for complex statistical analyses and automating data processing systems for analytics to help school districts and states use timely data to make better decisions. Jordan holds a B.A. in Economics and History from the University of Wisconsin-Madison.

Sponsors:
I would like to thank American Family for the food and Cloudera for an after meetup round of drinks.

Photo of Big Data Madison group
Big Data Madison
See more events