PyData Prague #35 - Probably unreliable vulnerabilities

Name: PyData Prague #35 - Probably unreliable vulnerabilities
Start: 2026-05-26T18:00:00+02:00
End: 2026-05-26T21:00:00+02:00
Location: Aisle offices

Hosted by Jan P. and Jakub U.

PyData Prague

Details

Hello Python extractors and vulnerable agents,

The 35th PyData meetup will take place at Aisle offices (Palác Zlatý kříž, 2nd floor). As usual, the talks will start at 18:30 but we encourage you to come as soon as 18:00 to enjoy the opportunity to socialize and refresh yourselves (which you can continue doing during the break and after the talks).

Our main goal is to build the community around Python and data and make it welcoming to people of various skills and experience levels.

⚡ If you are interested in giving a lightning talk (up to 5 minutes to present an idea, tool or results related at least to some degree to Python and/or data), please contact us before the event or at its beginning.

What a Single-File LLM Security Analyzer Taught Us?
(Stanislav Fort, Aisle)

High-quality AI security research can uncover real vulnerabilities in critical infrastructure. AISLE is one example of this higher-signal approach, with validated findings in projects like OpenSSL and curl. At the same time, low-quality AI-generated reports are flooding open-source maintainers with false positives.

How hard is it to find a security bug? We will explore that question through nano-analyzer, a deliberately simple open-source security scanner. For many vulnerability classes, the surprising core is not a complex platform, but a well-aimed LLM call wrapped in the right workflow.

This simplicity has limits. The approach may miss obvious issues, hallucinate risky findings, or produce inconsistent results across runs. That is why validation, triage, benchmarking, and human judgment matter, and why the real challenge is building reliable processes around unreliable primitives.

Getting reliable text when PDFs lie and OCR fails
(Marcela Brichtová Piptová, Rossum)

LLMs need text as an input. So before a model can reason about a document, we have to read the text, a step often treated as the "easy part" or a solved problem. But is it?

In this talk, we will explore the hidden complexities of text extraction. This is especially critical for models like Rossum's T-LLM, an encoder-only architecture which heavily relies on high-quality input. You will learn why transactional documents are sometimes surprisingly hard for OCR, why you can't always just copy-paste text from a PDF, and why text extraction is still a topic for Rossum researchers (and our customer support team).

PyData Prague

PyData Prague #35 - Probably unreliable vulnerabilities

PyData Prague

Details

Related topics

You may also like