Skip to content

MC09: When to Fine-Tune, Tokenization and Preprocessing Data

Photo of Mohammad Arshad
Hosted By
Mohammad A. and Patricia M.
MC09: When to Fine-Tune, Tokenization and Preprocessing Data

Details

MC09: When to Fine-Tune, Tokenization & Preprocessing for LLMs

đź“… Happening this Saturday, June 21 at 11AM GST!
👉 https://nas.io/artificialintelligence/events/mc09-aires5-finetune

Training powerful LLMs doesn’t start with models—it starts with data. Clean, well-prepped, and tokenized data is your secret weapon. Join this hands-on session to master the full pipeline, from raw web-scale text to fine-tuning-ready datasets.

What You’ll Learn:
🔤 Tokenizer selection (BPE, WordPiece, SentencePiece) & extending vocab
đź§˝ Deep-cleaning tricks: de-duplication, PII masking, and prompt-response alignment
🗂️ FineWeb integration: filtering, sharding, and streaming best practices
🤖 RAG vs Fine-Tuning – A decision guide based on cost, speed, and compliance

You’ll Walk Away With:
âś… A plug-and-play preprocessing repo for your next ML project
âś… A practical RAG vs Fine-Tune checklist
âś… Confidence to handle domain-specific, multilingual, or sensitive data at scale

Who Should Attend:
ML Engineers, Data Scientists, and Tech Leads who want to build smarter, faster, and safer AI systems.

Don’t miss this essential session for next-gen LLM builders.

Photo of DubAI and Data Professional group
DubAI and Data Professional
See more events
Online event
Link visible for attendees
FREE