Skip to content

Details

We have two speakers in September:

  • Martin Loetzsch Lightweight ETL pipelines with Mara
  • Mariano Semelman Query expansion using semantic query embeddings

**************

Martin Loetzsch

Lightweight ETL pipelines with Mara

In the past few years, data warehousing went through a radical transition from using click-based ETL tools to using code for defining data pipelines. In this process, the field of “data engineering” was born, Python became the dominant language for describing data integration pipelines and Apache Airflow emerged as the dominant framework in the field. However, for most companies that don’t operate at the scale of Airbnb, Airflow is quite an overkill when the task is to integrate a few GB or TB of data. In this talk, I will introduce Mara as a lightweight opinionated ETL framework halfway between Airflow and plain python scripts, with a focus on transparency and complexity reduction. It condenses the learnings from 6 years of building data warehouses for more than 20 of the portfolio companies of Project A. I will guide you through some of the design decisions behind the platform and some general learnings for setting up successful data engineering teams.

Martin Loetzsch works at Project A, a Berlin-based operational VC focusing on digital business models. As Chief Data Officer, he has helped many of Project A’s portfolio companies forming teams that build data warehouses and other data-driven applications. Before joining Project A (with a short interlude at Rocket Internet), he worked in artificial intelligence labs in Paris and Brussels on computational linguistics and robotics. He received a PhD in computer science from the Humboldt University of Berlin.

Members are also interested in