Apache Airflow introduction and best practices


Details
Time to meetup and talk about Hadoop related topics again! This time, we have two talks related to Apache Airflow.
We thank GoDataDriven for kindly hosting us in their still relatively new office at the Wibautstraat, and also for taking care of pizza and drinks.
Btw, GoDataDriven has a 10 day Data Science Accelerator Program which over a period of 5 months, combines hands-on lectures with workshops to advance your data science skills and knowledge to the highest level. The program starts on September 14, 2016 http://www.godatadriven.com/data-science-accelerator-program
(Your message here? Talk to us about sponsoring a meetup.)
Agenda
18.00: Arrive, drink, eat
18.45: Presentations
Introduction to Airflow - What is Airflow and how WePay uses it
Speaker: Chris Riccomini, Committer/PPMC on Apache Airflow (Incubating)
WePay is the leading provider of payments-as-a-service for online platforms including GoFundMe, FreshBooks, and Constant Contact. More than 17,000 tasks flow through Airflow every day at WePay. We'll provide a brief overview of what Airflow is, how it works, and what we use it for.
About Chris:
Chris Riccomini is a software engineer at WePay. He is currently focusing on building out WePay's offline and nearline data pipelines. Previously, Chris worked as an engineer at LinkedIn, where he worked on Apache Samza, a stream processing system.
Airflow best practices and Airflow Roadmap
Speaker: Bolke de Bruin, Committer/PPMC on Apache Airflow (Incubating)
In ING we found that we needed flexible, maintainable, testable workflows. We ingest from a myriad of internal data sources: even mainframe data is not unknown to us. Airflow plays a critical role in helping to do this in a resilient and fault tolerant way. This talk will focus on why we chose Airflow, some best practices and future roadmap.
About Bolke:
Bolke is Head of Advanced Analytics Technology at ING Commercial Bank. Apart from running a team of data scientists and engineers, he also likes to keep his hands dirty with technology. This has resulted in code contributions to several Hadoop ecosystems, such as Apache Spark, Apache Ambari and Airflow.
20.30: Some more drinks, socialize
21:30: Everybody out!

Apache Airflow introduction and best practices