Sandboxing and Approval Testing for Data Scientists and Engineers


Details
Welcome to this meeting of the Gothenburg Python User Group. Emily Bache will give a talk about some recent work she's been doing and some open source tools for testing. There will also be time for talking to the other Python users who will be there.
Many thanks to ProAgile for sponsoring this meeting with a venue and sandwiches.
Agenda
17:30 - Mingle and sandwiches
18:00 - Welcome, introductions, sponsor presentation
18:15 - Talk by Emily Bache (description below)
19:00 - Break, mingle
19:15 - Group exercise trying out the tools
20:15 - End of hands-on session
20:30 - Close of meeting
Talk description
Sometimes the data is the most important part of the software. Your program may need to read from more than one data source and combine records to produce new data. Sometimes your program needs to update existing records, or even remove some data. As you develop these kinds of programs, it can be useful to have regression tests that let you see the impact of changes on the kinds of data you normally work with.
I recently did some technical coaching with a team of data scientists and engineers. I worked with them to create a sandboxed environment where they could try out their programs against copies of production data. They could see the effects of refactoring and adding functionality, without impacting others. They could also create regression tests and share them with the whole development team. In future they hope to run these tests in a continuous integration system.
In this GothPy meeting I'd like to show you the techniques and tools we were using - DBText (https://github.com/texttest/dbtext) and TextTest (https://texttest.org/). I'm one of the developers of these open source tools which are both written in Python, although they can test programs written in almost any programming language.

Sandboxing and Approval Testing for Data Scientists and Engineers