Name: MC09: When to Fine-Tune, Tokenization and Preprocessing Data
Start: 2025-06-21T11:00:00+04:00
End: 2025-06-21T12:00:00+04:00

**MC09: When to Fine-Tune, Tokenization & Preprocessing for LLMs**

📅 **Happening this Saturday, June 21 at 11AM GST!**
👉 [https://nas.io/artificialintelligence/events/mc09-aires5-finetune](https://nas.io/artificialintelligence/events/mc09-aires5-finetune)

Training powerful LLMs doesn’t start with models—it starts with **data**. Clean, well-prepped, and tokenized data is your secret weapon. Join this **hands-on session** to master the full pipeline, from raw web-scale text to fine-tuning-ready datasets.

**What You’ll Learn:**
🔤 Tokenizer selection (BPE, WordPiece, SentencePiece) & extending vocab
🧽 Deep-cleaning tricks: de-duplication, PII masking, and prompt-response alignment
🗂️ FineWeb integration: filtering, sharding, and streaming best practices
🤖 RAG vs Fine-Tuning – A decision guide based on cost, speed, and compliance

**You’ll Walk Away With:**
✅ A plug-and-play preprocessing repo for your next ML project
✅ A practical RAG vs Fine-Tune checklist
✅ Confidence to handle domain-specific, multilingual, or sensitive data at scale

**Who Should Attend:**
ML Engineers, Data Scientists, and Tech Leads who want to build smarter, faster, and safer AI systems.

Don’t miss this essential session for next-gen LLM builders.

Mohammad Arshad

Patricia Mari

DubAI and Data Professional

Technology

Professional Development

Innovation

Education & Technology

Courses and Workshops

Big Data

Predictive Analytics

Data Science

Machine Learning

Data Analytics

MC09: When to Fine-Tune, Tokenization and Preprocessing Data

Online event

Share this event

MC09: When to Fine-Tune, Tokenization and Preprocessing Data

Details