Skip to content

Details

  • ๐ŸŒ Network over ๐Ÿ• pizza and drinks ๐Ÿบ๐Ÿฅค with fellow Python users ๐Ÿ
  • ๐Ÿ‘€ Watch a presentation
  • ๐Ÿ—ฃ๏ธ Discuss the presentation
  • ๐Ÿ‘ฅ and maybe hit the local pub!

---
Most AI and data pipelines treat document conversion as a solved problem.
It isnโ€™t.
PDFs lose captions. Tables flatten. Numbered lists reset. Reading order breaks. The converter returns โ€œsuccessโ€ and the pipeline moves on โ€” even when the meaning has already been damaged.
The file is intact.
The meaning is gone.
This talk explores why document conversion is not just a file format problem, but a meaning preservation problem โ€” and why these failures often go unnoticed until much later, in contexts where the original document is no longer easy to verify.

Iโ€™ll introduce **any-doc-to-md**, an open-source approach that treats conversion as a competitive and observable process: run multiple converters, compare outputs, score structural quality, and audit the result.
The goal is not another converter. The goal is to make document conversion measurable, testable, and auditable.

This session is aimed at engineers working with document pipelines, RAG systems, knowledge bases, or any workflow where correctness matters.
If your pipeline trusts โ€œsuccessful conversionโ€ without evidence, this talk is for you.

Dr Timur Yusupov is a Principal Architect/Engineer based in Sydney, with 15+ years of experience building data and AI systems in regulated and high-stakes environments. His work focuses on making real-world pipelines more reliable, auditable, and resilient โ€” particularly where document processing, retrieval, and data quality directly impact decisions. He develops open-source tools such as TRACE and any-doc-to-md to surface and address failure modes that often go unnoticed until they become operationally or commercially costly.
---
๐Ÿ“ข Interested in presenting? We'd love to have you, submit here!
๐Ÿ’ฌ Want to chat? Get in touch via slack!
---
Thanks to our sponsors: Linux Australia and ANU!
Want to sponsor our costs for pizza and drinks, and meet our enthusiastic members? Get in touch!
---
You can find our Code of Conduct here: https://github.com/canberra-python/README/tree/main/conduct
We care about new people and quiet people, so we want to have some bare minimum standards that we expect.

Related topics

You may also like