DevOps for Data Science
Details
DevOps for Data Science or
How to build a data pipeline that is not a giant house of cards built from random scripts that will all completely collapse the moment any input does anything weird (see https://xkcd.com/2054/)
We’ve all heard the saying about the 80-20 rule in data science, that most of the project time is spent on wrangling the data in to correct format and only a fraction on the actual model building. The truth is actually even worse: there’s also the effort needed to make sure your model will not explode when left to run on its own in production – or to make sure you will notice when it does.
That’s where DevOps comes in. In this talk Seija Sirkiä will introduce basic concepts like version control and CI/CD (continuous integration, continuous development) and attempt to do this from the perspective and to an audience of data scientists (maybe particularly those who use R and RStudio). This is not a lesson in exact tools but more a plea for a change in attitude. It is true that we are not software engineers and we don’t exactly produce software but a big part of our daily troubles are the same.
Program:
18.00 Welcome !
18.10 Seija Sirkiä and DevOps for Data Science
19.10 Mingle and questions
20.00 Closing the event
Place: Houston Analytics, Konepajankuja 1

