Doubleheader: Truncating Unicode and Efficient Sparse Arrays
Details
Doug Hoyte: Truncating Unicode -- Suppose you need to truncate an arbitrary-length unicode string to fit into a fixed byte-length field. In this talk I introduce the Unicode::Truncate module, but not after we go down a deep rabbit-hole of unicode topics including encodings, surrogate pairs, over-long UTF-8, combining characters, normalisation forms, extended grapheme clusters, and unicode consortium test-suites. Are there security implications of unicode? How many bytes can a single unicode character take? Which writing system is special-cased in the unicode segmentation standard? In addition, we'll go over state-machine parsing with Ragel and the Inline::Filters::Ragel module, distributing Inline modules with Inline::Module::LeanDist, perl's "utf8 flag", and zero-copy string truncation.
Richard Farr: Efficient Sparse Arrays -- You want it all: random insertion and retrieval; a massive expanse of arbitrary keys; forward and backward order traversal; and all with lightning performance and the memory footprint of a mouse. In short you want Judy Arrays. In this talk I'll go over the basic theory of Judy Arrays and how and when to use them to get maximum performance and memory efficiency when working with large amounts of sparse data. I'll show examples and how you can get started using them in your favourite language!
