Whoever Said Tokens Have to Be Characters?
Details
Megaparsec is a well known Haskell parsing library, most often used to parse text. But a token stream doesn't have to be characters, it can be anything with the right instances. What if you fed it a stream of Aeson.Value instead?
Extracting structured data from real-world formats is often a messy process. Presentation is optimized for human consumption, not for automated processing: What looks like a table is really a sequence of heterogeneous rows with little shared structure. JSON can serve as a useful interchange format in those cases, but parsing becomes much more complicated than matching keys to fields.
Treating decoded JSON as a token stream and running Megaparsec over it gives you composability, meaningful errors, backtracking, and the full combinator toolkit.
The talk is based on an hledger personal finance automation use-case, turning PDF invoice statements into hledger journal entries.
This talk follows the Zurich Friends of Haskell General Assembly that starts at 6pm in the same room, which all members are welcome to attend.
