Skip to content

Tokenizer-free language model inference

Photo of Andreas Hartel
Hosted By
Andreas H.

Details

A team from Aleph Alpha will talk about Tokenizer-free language model inference.

Abstract:
Traditional Large Language Models rely heavily on large, predefined tokenizers (e.g., 128k+ vocabularies), introducing limitations in handling diverse character sets, rare words, and dynamic linguistic structures. This talk presents a different approach to language model inference that eliminates the need for conventional large-vocabulary tokenizers. The system operates with a core vocabulary of only 256 byte values, processing text at the most fundamental level. It employs a three-part architecture: byte-level encoder and decoder models handle character sequence processing, while a larger latent transformer operates on higher-level representations. The interface between these stages involves dynamically creating "patch embeddings", guided by word boundaries or entropy measures. This talk will first introduce the intricacies of this byte-to-patch transformer architecture. Subsequently, we will focus on the significant engineering challenges encountered in building an efficient inference pipeline, specifically coordinating the three models, managing their CUDA graphs, and handling their respective KV caches.

🔈 Speakers:

  • Pablo Iyu Guerrero and Lukas Blübaum

Agenda:
✨ 18:30 Doors open: time for networking with fellow attendees
✨ 19:00 Talk and Q&A
✨ 20:00 Mingling and networking with pizza and drinks
✨ 21:00 Meetup ends
- Where: In person, Aleph Alpha Heidelberg, Speyerer Str. 14
- When: Tuesday, May 20th
- Language: English

Photo of Aleph Alpha AI meetup Heidelberg group
Aleph Alpha AI meetup Heidelberg
See more events
Aleph Alpha Heidelberg
Speyerer Strasse 14 · Heidelberg
Google map of the user's next upcoming event's location
FREE
30 spots left