Scheduling workloads with Apache Airflow and running Spark on Google Cloud


Details
We have two slightly related big data topics for you today.
Apache Airflow on GCP
Google Cloud has a lot of great Big Data services, but what if you need to orchestrate all these services. Are you tired of maintaining cron jobs with lots of un-reliable batch scripts? Well we have a great alternative: Apache Airflow (https://airflow.incubator.apache.org/) !
Since 1.8 Airflow has a lot of Google Cloud service support (Cloud Storage, Dataproc, Dataflow, BigQuery, ...). We'll go over some scenario's on why and where you would use Airflow and highlight the Google support.
You can start following his series on Airflow on medium (https://medium.com/google-cloud/airflow-for-google-cloud-part-1-d7da9a048aa4).
Spreaker: Alex Van Boxel (Google Developer Expert at night, Software Architect @ Vente-Exclusive.com in the day). Alex is also a commiter to the Apache Airflow project.
Running Apache Spark as a service on Cloud Dataproc
Running Spark on Google Cloud is easy with with Dataproc (https://cloud.google.com/dataproc/). It's a managed Hadoop and Spark solution. It spins up in just over a minute, that you start thinking differently about your jobs. Maybe have a dedicated cluster for a certain job and destroy afterwards?
We'll also highlight it's integration in Airflow.
Speaker: Bob De Schutter (Data Scientist at Vente-Exclusive.com)
Thanks for OTA Insight for providing the location drinks and sandwiches. Our first meetup in Gent!

Scheduling workloads with Apache Airflow and running Spark on Google Cloud