An Introduction to Airflow - Managing data pipelines programatically.

Singapore Data Engineering Meetup
Singapore Data Engineering Meetup
Public group
Location image of event venue

Details

Presented by XiaoDong - data engineer at DBS

Talk Abstract:
Apache Airflow was started in 2014 at Airbnb, then became an Apache incubator project in 2016.

Mainly built with Python, Airflow is a platform helping programmatically author, schedule and monitor workflows (like data pipelines). Workflows can be defined as directed acyclic graphs (DAG) and configured as Python scripts. It supports complex task dependency management, distributed task execution, and integration with different technology stacks/tools (RDBMS, AWS, Google Cloud, HDFS, Spark, etc.). Friendly UI is also provided for monitoring and configuration.

Currently Airflow is adopted by many companies, including Airbnb, Quora, and ING. Cloud service providers, like Google and Amazon, are contributing components to integrate Airflow with their cloud services as well. Alibaba also built a workflow scheduling system named "Maat" on top of Airflow.

In this talk, we would also like to share how DBS bank is using Airflow (like in-house customization), as well as how we're working together with the community to improve it.