Hope everyone is doing fine and enjoying this sunny spring time. We are happy to announce that we are organising our fourth and last event before the Summer break! Our good friends of Idealista will be hosting the event this time (thank you guys!).
Do not forget to follow us at @MadridDataEng for updates on this event and info on future events.
*** Defining data pipelines workflows using Apache Airflow ***
Apache Airflow is a workflow automation and scheduling system that can be used to author and manage data pipelines. Workflows are defined programmatically as directed acyclic graphs (DAG) of tasks, written in Python. At Idealista we use it on a daily basis for data ingestion pipelines.
We’ll do a thorough review about managing dependencies, handling retries, alerting, etc. and all the drawbacks.
*** About Juan - @juanriaza ***
Since 2006 I've been developing software using mainly the Python programming language. From backend web development using Django, massive data mining using Scrapy, data wrangling and complex data pipelines... my curiosity always fuel my development.
I joined Idealista two years ago with a clear mission: democratize access to datasets from external sources. We use technologies such as Apache Airflow, Apache Spark and are always checking the newest tools from the data eng ecosystem to find the best fit for our needs.
*** The talk will take place in the Idealista offices (Plaza de las Cortes 2, 5ª planta), with a ~45 mins presentation in English or Spanish depending on the audience + questions followed by pizza + beer networking. ***