Skip to content

Details

💡 Perfect for Devs, ML Engineers, Founders, and Curious Professionals

***

## 📅 Total Duration: 2 Hours

Format: 80 mins talk + 25 mins live demo + 15 mins Q&A
Goal: Understand how ChatGPT is actually built, how it thinks, and how to build with or fine-tune LLMs.

***

## 🧠 Full 2-Hour Content Breakdown

***

### ⏱️ 0–10 min: Introduction

  • 🤔 What is an LLM?
  • 📊 Real-world applications (ChatGPT, GitHub Copilot, Claude, etc.)
  • 🧭 Session agenda & what they’ll walk away with

***

### ⏱️ 10–30 min: The Core Brain – Transformers

  • 🤖 How transformers work: Self-attention, multi-head attention
  • 🔁 Sequence-to-sequence & next-token prediction
  • 🧱 Architecture of GPT (blocks, layers, position embeddings)

📊 Diagram: Full GPT model stack
🎥 Analogy: Predict the next word in a sentence like “autocomplete on steroids”

***

### ⏱️ 30–50 min: Training Pipeline – How LLMs Learn

  • 🏗️ Pretraining: Language modeling objective (next-token prediction)
  • 📚 Data: What’s used to train GPT-style models (web, code, books)
  • 🧠 Fine-tuning:
  • Instruction tuning (follow commands)
  • RLHF (Reinforcement Learning with Human Feedback)

📊 Explain PPO + Reward Model
💡 Why RLHF makes ChatGPT feel “polite” and “useful”

***

### ⏱️ 50–70 min: Inference & System Design

  • 🧩 Tokenization (BPE): What is a “token”? Why does it matter?
  • 🔄 Token flow: Input → Model → Output
  • ⚙️ System architecture:
  • API, frontend, backend
  • GPU inference, context caching, rate limiting

📊 Architecture Diagram: End-to-end flow of a ChatGPT API request

***

### ⏱️ 70–80 min: “Secrets” of ChatGPT’s Performance

| Secret | Insight |
| ------ | ------- |
| 🧠 Mixture-of-Experts (MoE) | GPT-4 may use sparse routing |
| 🚀 FlashAttention | Faster attention = cheaper inference |
| ⚖️ Alignment training | Safety filters & refusal mechanisms |
| 🧩 Prompt Engineering | The real “art” of using LLMs |

***

### ⏱️ 80–105 min: Live Demos: How to Use or Build with LLMs

Choose any 2-3 short demos from below:

#### ✅ 1. Use OpenAI GPT-4 API

  • Send a prompt using Python (openai SDK)
  • Show how token count and cost work

#### ✅ 2. Retrieval-Augmented Generation (RAG)

  • Build a “Chat with your Docs” using LangChain or LlamaIndex
  • Load PDF → embed → chat

#### ✅ 3. LoRA Fine-tuning (optional if audience is ML-heavy)

  • Use HuggingFace + LoRA to fine-tune Mistral/LLama on custom data

***

### ⏱️ 105–120 min: Q&A + Wrap-Up

  • Top questions: safety, hallucinations, token limits, copyright
  • Bonus topics to explore: Agents, Multimodal LLMs, Vector DBs
  • Share: GitHub repo, prompt sheet, learning links

Join Zoom Meeting

[https://us02web.zoom.us/j/86369463178?pwd=mhZqUrFbGvomnSgV8oDdUIwrEEUnf1.1](https://www.google.com/url?q=https://us02web.zoom.us/j/86369463178?pwd%3DmhZqUrFbGvomnSgV8oDdUIwrEEUnf1.1&sa=D&source=calendar&usd=2&usg=AOvVaw0yWqUaOdTCWtxujAFhyGrR)

Meeting ID: 863 6946 3178
Passcode: 673750

AI and Society
Artificial Intelligence
Machine Learning
Cloud Computing
Golang

Members are also interested in