Skip to content

Details

We're back at Salisbury House for our summer London Airflow Meetup! Join fellow members of the data engineering community for an evening of engaging talks, great food and drinks, and exclusive swag!

PRESENTATIONS

Talk #1: The Orchestration Layer: A Blueprint for Scaling Dynamic DAGs with Integrated Data Quality Gates

Ensuring high-quality data, accuracy, completeness, and schema validity is essential for building data trust. However, scaling data quality checks across diverse datasets often leads to duplicated boilerplate Python code, human error, and a massive bottleneck for Data Engineering teams.

In this talk, we’ll explore how to shift data quality from a backlog of tech debt into an automated, self-serve developer experience directly tied to Airflow. We will demonstrate how to build a multi-faceted data observability framework that empowers end-user teams to define their own DQ rules without writing any Python code. By hiding complex Airflow code, users simply select their parameters—YAML configurations that instantly translate into fully dynamic, end-to-end DAGs, including in-flight data quality gates.

The core of this session focuses on the technical implementation of integrated DQ checks that live directly inside the generated pipelines. You’ll learn how this metadata-driven approach enables a "shift-left" strategy for data observability, automatically enforcing data contracts and routing alerts without manual engineering intervention.

Talk #2: Self-healing Data Pipelines in Airflow

Modern data pipelines remain largely reactive—failures trigger alerts, manual retries, and ongoing operational overhead for engineering teams. As systems scale, this approach introduces fragility, delays, and growing complexity in maintaining reliable workflows.

This session explores how to design self-healing data pipelines in Airflow using practical patterns such as intelligent retries, conditional branching, and targeted recovery mechanisms. Real-world failure scenarios are used to illustrate how pipelines can detect issues, trigger remediation steps, and resume execution without full restarts or human intervention.

The session also looks ahead to how AI-assisted anomaly detection can further enhance these systems by identifying unexpected patterns and enabling more proactive, resilient data workflows.

Talk #3: What We Got Right (and Wrong) Building a 50-Source Data Platform on Airflow

Collibra helps enterprises govern their data, but how does Collibra's own data team manage the data that powers the business? The answer is an Airflow-based platform that stitches together dlt, dbt, Kubernetes and Collibra's own product for governance -- all orchestrated through 87 DAGs serving 12 business domains.

This talk traces a data point's journey through the stack: from API extraction via dlt pipelines running inside KubernetesPodOperator pods, through a layered dbt architecture where platform engineers standardize data and analysts build business models on top, into analytics outputs that serve multiple business teams, and finally back into Collibra Data Governance via reverse-ETL integration DAGs.

Along the way, I'll share the patterns that emerged from operating this at scale: how we evolved from one-off custom pipelines to a reproducible framework that's documented well enough for AI subagents (or a new hire) to generate models from scratch, how our own Pod Operators let us test feature branches on shared environments, and how we're leveraging that same mechanism to migrate from Redshift to BigQuery (and from Airflow 2 to Airflow 3) without downtime. Behind the scenes, a 1Password-backed pipeline diffs credential hashes nightly, catching rotations before they break a DAG.

AGENDA

  • 5:30-6 PM: Arrivals, networking, food & drinks
  • 6-7:45PM: Presentations
  • 7:45-8PM: Networking

Related topics

Sponsors

Astronomer Inc

Astronomer Inc

Supercharge Airflow with our modern data orchestration platform

You may also like