Write Your Own GPT from Scratch: Session 1 Tokenizer


Details
As a follow up to 'Write your own GPT from scratch' at the AppliedAI conference this May, we are organizing a series of short workshops to go over different elements, in depth, that were covered in the presentation.
Sona Maniyan will guide us constructing a GPT.
For the first session, the focus will be on setting up our workspace for the series of workshops and an in-depth look at tokenization. We will explore the role a good tokenizer plays in large language models and build a tokenizer from scratch. Following this, we will use python libraries that allow seamless tokenizing.
In this workshop, we will instantiate a Tokenizer object with a model, then set its normalizer, pre_tokenizer, post_processor, and decoder attributes to values we want. We will work through some examples using tiktoken, the opensource Python tokenizer library from OpenAI.
It's been 3+ yrs folks. As a reminder, bring your laptop computers. 💻
COVID-19 safety measures

Write Your Own GPT from Scratch: Session 1 Tokenizer