LLMs for Historical Data Extraction with Sarah Kiener

Name: LLMs for Historical Data Extraction with Sarah Kiener
Start: 2025-08-28T19:00:00+02:00
End: 2025-08-28T21:00:00+02:00
Location: Hardturmstrasse 253

Hosted by PyLadies Z.

PyLadies Zurich

Details

Join us for an exciting talk by applied machine learning engineer and Pyladies Zurich co-organizer Sarah Kiener:

Leveraging LLMs for Historical Data Extraction: Zurich’s 18th-Century “Nachtzedel”

Ever wondered how to get clean, structured data from messy, centuries-old documents?
In this talk, we’ll walk through a real-world Python pipeline using GPT-4.1 to extract names, professions, and places from 18th-century Zurich “Nachtzedel” — leaflets riddled with poor OCR, multilingual spelling quirks, and inconsistent formatting.

You’ll see how prompt engineering, iterative refinement, and rule-based post-processing can turn unpredictable LLM outputs into reliable, machine-readable datasets. We’ll cover evaluation strategies, error rates, and lessons learned — so you can adapt these techniques to your own noisy or domain-specific text data.

After the talk, enjoy lively discussions and a small apéro on our amazing sponsor Supertext's rooftop terrace ☀️⛱️

PyLadies Zurich

LLMs for Historical Data Extraction with Sarah Kiener

PyLadies Zurich

Details

Related topics

You may also like