LLMs for Historical Data Extraction with Sarah Kiener
Details
Join us for an exciting talk by applied machine learning engineer and Pyladies Zurich co-organizer Sarah Kiener:
Leveraging LLMs for Historical Data Extraction: Zurich’s 18th-Century “Nachtzedel”
Ever wondered how to get clean, structured data from messy, centuries-old documents?
In this talk, we’ll walk through a real-world Python pipeline using GPT-4.1 to extract names, professions, and places from 18th-century Zurich “Nachtzedel” — leaflets riddled with poor OCR, multilingual spelling quirks, and inconsistent formatting.
You’ll see how prompt engineering, iterative refinement, and rule-based post-processing can turn unpredictable LLM outputs into reliable, machine-readable datasets. We’ll cover evaluation strategies, error rates, and lessons learned — so you can adapt these techniques to your own noisy or domain-specific text data.
After the talk, enjoy lively discussions and a small apéro on our amazing sponsor Supertext's rooftop terrace ☀️⛱️