Luigi - Big data, little boilerplate


Details
Gorąco zapraszamy na pierwsze w tym roku spotkanie WHUG, gdzie naszym gościem i jednocześnie prelegentem będzie Elias Freider.
Title: Luigi - Big data, little boilerplate
Speaker: Elias Freider (Spotify)
One of the big time sinks for data scientists today is writing glue code to trigger recurring runs of tasks, specifying dependencies between different data sources and cleaning up the mess when something goes wrong. At Spotify we have realised how Python is a great language for expressing this glue as concisely as possible.
Luigi is Spotify's popular open source library for batch data processing including dependency resolution and monitoring. It is entirely written in Python and utilises some language magic that makes writing glue code swift and intuitive.
Spotify has terabytes of data being logged by backend services every day for everything from debugging to reporting reasons. The logs are basically huge semi-structured text files that can be parsed using a few lines of Python. From this data aggregated reports need to be created, data needs to be pushed into SQL databases for internal dashboards, related artists need to be calculated using complex algorithms and a lot of other tasks need to be performed, using many different programming languages and tools.
Through some real world examples we will show how the Luigi library evolved to tie all of this together and what problems it can help you solve. In the process you will hopefully be inspired to use some of Python's power features to improve your own tooling, possibly in completely different areas of computing.
Bio: Elias Freider is an analytics infrastructure engineer at the music-streaming company Spotify, based in Stockholm, Sweden. He is one of the two main authors of open-source Luigi Python framework for batch data processing and dependency management, used by Spotify for crunching terabytes of data every day.

Luigi - Big data, little boilerplate