From scan to csv: digitising birth record data & introducing the new conveners!


Hi everyone,

To start the new year with EdinbR we're meeting on the 15th January. So join us from 6.00pm in David Hume Tower, LG.09 ( As usual, the meeting will be open for all to attend, and newcomers / beginners are very welcome. After the talks we'll be heading off to a pub nearby, so do come along!

Our speaker is Mirjam Eiswirth, presenting work in collaboration with Dr Andreas Steinhauer. Mirjam has finished her PhD in Linguistics and English Language, and Andreas Steinhauer is a Lecturer in Economics at the University of Edinburgh:

## From scan to csv: digitising Austrian birth record data from the late 19th century

How have birth rates changed over time, stratified by geography and socio-demographic factors? We can answer these questions with large-scale data for the relatively recent past, but handwritten historical records are currently not easily accessible.

This pilot study explores the potential of digitising historical birth records (parish books containing births and baptisms) for such statistical analyses, using a handwritten text recognition software, Transkribus ( This work-in-progress-talk focuses on the final step in the data extraction workflow, the processing of the exported text: How can we clean messy textual data in tables, which still contains spelling errors? Which keyword spotting and classification or clustering tools could be applied? How can we extrapolate social information like for example the sex of the baby or the parents’ social status?

We are looking forward to sharing this exciting project with you, and to a lively discussion about cleaning and processing messy textual data.


In the second part of the meeting, we will introduce the three new EdinbR conveners:

* Federico Andreis (
* Karim Rivera (
* Mike Spencer (