This is the 1st Apache Airflow meetup in Seattle! Everyone interested in Data Engineering, ETL, workflow scheduling and orchestration, who wants to learn about one of the newer and exciting Apache projects are welcome! We welcome you at the Google (https://cloud.google.com) office in Seattle!
18:00 - Registrations, speed networking, pizza and drinks.
18:30 - kick-off
18:40 - Intro to Apache Airflow by Jakob Homan (Apache Airflow Committer and PMC Member)
19.10- Large-scale High-performance Airflow Deployment in Cloud Composer by Zhou Fang (Google)
19:40 - he K8s Executor: One Year In by Daniel Imberman
20:10 - Networking
1st talk - Introduction to Apache Airflow
Jakob will walk us through the Airflow architecture, what Airflow is and what it isn't.
Jakob is an Apache Airflow committer and PMC member, as well as a committer/pmc member on Hadoop, Samza and Kafka, as well as a contributor to many more projects. He's currently working on the Data Platform at Lyft. He may or may not be third in line for the throne of Liechtenstein.
Great advantages in its rich functionality and high flexibility have made Airflow among the top choice in managing Data Analytics workload. Scalability, however, it is not a high priority in its design. Until recently, we have seen a rapidly increasing demand from users to manage a large amount of Airflow workflows in Cloud Composer. It makes improving scalability in Airflow an important component in addressing customer needs. This talk presents our efforts in serving a large scale Airflow cluster:
- Using DAG serialization in Airflow webserver
- Best practice in scheduler/worker performance optimization
Zhou Fang is a software engineer in Cloud Composer team at Google. Before joining Google, he received his PhD in CS from University of California, San Diego in 2018, and MS from ETH, Zurich in 2014. He has experience in a variety of cloud and data areas, including research into cloud and mobile computing for computer vision applications, as well as internships with Google Dataflow and Kubernetes teams during his PhD study.
Over the past two years, it has been incredible seeing the Kubernetes Executor move from a small over-the-weekend demo to a full execution environment run in production at numerous companies. With the k8s executor soon moving off of "experimental" this would a good time to discuss some lessons learned, best practices, and what's next for Airflow integration in the Kubernetes environment.
We will discuss current options for launching an airflow k8s environment from scratch, how to optimize your cluster for meeting your SLAs, and common pitfalls that new users run into.
Daniel Imberman is an Apache Airflow committer, an engineer at Astronomer.io, and a digital nomad testing the limits of wifi capabilities in third world. Daniel led the Kubernetes Executor initiative and now works to simplify airflow usage, development, and operations. Daniel is currently based in Medellin, Colombia, but will gladly consider any country with decent food and >10MB/s connection.
Thanks to Google (https://cloud.google.com) for providing the space and sponsoring the meetup.