Skip to content

Details

This two-part discussion series will explore how to make humanitarian spreadsheets more “AI-ready,” bringing together UN OCHA’s new guidance project with real-world lessons from recent AI spreadsheet extraction experiments.

UN OCHA is developing a short, practical guide to help humanitarian teams publish “AI-ready” public datasets that work better with tools like ChatGPT, Copilot, Gemini and open source models like Kimi K2 and GPT OSS running on providers like Groq when users simply upload a CSV or Excel file and start asking questions. The focus is on non-technical users who will not configure agents, write code, or reverse-engineer cryptic column names, but instead expect the AI to correctly interpret the file structure and labels out of the box. By recommending clear naming, consistent tabular layouts, and lightweight documentation, the guidance aims to reduce misinterpretation, hallucinations, and broken analyses when consumer AI tools encounter real-world humanitarian data.

Jan Zheng, a Developer Relations Engineer at Groq who helps people design and build AI prototypes, is exploring exactly these challenges from the model and tooling side. His recent experiments with spreadsheet extraction show that messy, multi-table spreadsheets routinely confuse even advanced models and agent frameworks, leading to unreliable extraction, off‑by‑one errors, looping agents, and high costs. These problems are amplified when complex datasets or vast amounts data are processed by non-technical users of commercial AI tools and open models. Lessons learned through research and usage can inform UN OCHA guidance by clarifying which spreadsheet patterns break current AI tools, which structures make extraction more robust, and how to balance “ideal” AI-ready formats with the messy realities of operational humanitarian spreadsheets.

Over two separate meetup discussions, staff from UN OCHA will introduce the AI‑ready data project in more detail, walk through the specific use case they are targeting, and answer questions from participants about scope, constraints, and potential applications in humanitarian settings. These sessions are designed to surface real-world experiences from practitioners who publish, manage, or use open humanitarian data, and to gather concrete examples of what works and what breaks when datasets are run through consumer AI tools and open source tools running through providers like Groq.

On a following date, Jan will join a dedicated session to react to the project, share his experimental findings on spreadsheet extraction, and discuss how infrastructure choices such as model selection, speed, and prompting strategies interact with the way humanitarian data is structured and published. His perspective will help bridge the gap between guidance aimed at data publishers and the realities of building and tuning AI systems that can reliably interpret messy, real-world spreadsheets used across the humanitarian sector.

Civic Engagement & Technology
Open Data
Open Source
Humanitarian

Members are also interested in