1st Apache Airflow meetup at Heineken

This is a past event

70 people went

Location image of event venue

Details

Hi all,

I'm excited to announce the first Airflow meetup. This time we'll have two presentations.

Niels Zeilemaker from GoDataDriven will present Using Azure Container Instances as a cost efficient method to run Heineken datascience workloads. Daniel van der Ende and John Muller from ING Wholesale banking Advanced Analytics will talk about Data Tests using Apache Airflow.

• 18:00 Arrive, mingle, pizza, drinks etc.

• 18:45 Using Azure Container Instances as a cost efficient method to run Heineken datascience workloads by Niels Zeilemaker

At Heineken we use Airflow to manage our datascience workflows. By combining Airflow with Docker, we give datascientists the opportunity to modify dataflows while still having a centralized place where monitoring and credentials are organised. But after Microsoft announced Azure Container Instances, and implementing a custom operator, we reduced the costs of our solution by exploiting the pay per second functionality it provides.

In this talk we'll give a high level overview of our solution, after which we'll do a deep dive into the details of the ACI integration.

• 19:45 Data's Inferno: Nine Circles of Data Tests with Apache Airflow by Daniel van der Ende and John Muller

Continuous delivery is a given nowadays. This goes hand in hand with a lot of automated testing. For 'normal' applications, such testing is well known and documented in the form of unit tests, integration tests, regression tests etc. For big data applications, however, another dimension of complexity is added: that of the data itself. The truth is: real data sucks, it always surprises you by how it differs from what you expect. Unreliable data, in turn, can result in unreliable applications, which makes for unhappy users. In this talk, we'll take you on a journey through our Nine Circles of Data Tests which ensure the data is correct and makes sense. We use Airflow to do this, testing our data and logic at several steps, in order to avoid having to debug such issues over the weekend.
Topics include:
- CI tests for your data deployments
- Integrating data tests into your DAG
- DTAP-ing your data deployments
- Integrating data science models into this engineering world
- How we went nuclear with GIT
- How Chuck Norris keeps us honest

• 21:30 Everybody out