DataOps at the pharmaceutical industry, with Martí Segarra
Detalles
We’re starting the decade strong at the Meetup with more events! This time Martí Segarra will share with us how Boehringer-Ingelheim is tackling the complexity of moving pharmaceutical data around with a framework they built!
See you next Tuesday 14th, 19:00 at Exoticca offices (thank you for inviting us!)
Don’t miss it!
Title:
Frumentarii, A single DataOps experience for a Distributed Data Mesh
Abstract:
One of the main challenges it comes when working with multiple data environments (aws, azure, on-prem) is the large amount of technologies teams must master. This often leads to siloed and hyper-specialized data engineering teams in contrast to the product oriented mindset organizations tend to favor nowadays. Furthermore, in a highly governed organization, previous learnt concepts do not always apply. In order to enable the different members within the organization to work focused on adding value to society rather than managing their data pipelines, we faced the need to provide a single experience over ingestion, dispersal and processing in “Big data” environments. Flexibility is needed to cover the diverse layout of data and loading types, and the infrastructure used for execution and orchestration of them. Altogether with a strict data, process and SLC governance characteristically encountered in the pharmaceutical industry.
Frumentarii is a framework and a library to provide as much abstraction as possible over process governance (multitenancy, monitoring, reliability...), data/software lifecycle (deployment, promotion process across environments, documentation, testing) and Infrastructure (execution instances, orchestration...). With its core built in with Spark, we will explore how the modular Spark pattern, explained in a previous conference by Albert Franzi, and our experience in DataOps, gave us the idea to expand it further.
Finally, we will explore and compare similar frameworks like Linkedin's Gobblin and Uber's Marmaray, inspiration sources for our work.
Bio:
Martí is a physicist who started to program with PHP at 15. It took him a while to fall in love with data analytics, and started working with Spark at 2016. Faced with the problems of building reliable data pipelines in a highly governed environment, and with an eye on open source communities, started to work as product owner for the company's Datalake governance system and enhanced the data management framework.
Currently he is working as Big Data Solution Architect in the Boehringer-Ingelheim's platforms team, bridging the gap between data application development and operational world.
