Model Behavior Study Group: Constitutional AI

Name: Model Behavior Study Group: Constitutional AI
Start: 2026-03-20T22:00:00+09:00
End: 2026-03-20T23:00:00+09:00

Ведущий Suzana

Супер Организатор

Machine Learning Tokyo

Детали

Study Group Topic: Constitutional AI
Dive into the foundational research on using principles rather than human feedback to train safe AI.

Constitutional AI: Harmlessness from AI Feedback (Bai et al., 2022)
The original paper introducing Constitutional AI - training AI assistants to be helpful and harmless using self-critique and AI-generated feedback rather than human labels.
🔗 https://arxiv.org/abs/2212.08073

Collective Constitutional AI: Aligning a Language Model with Public Input (Huang et al., 2024)
Extends Constitutional AI by incorporating ~1,000 Americans' input to democratically create principles for AI behavior.
🔗 https://arxiv.org/abs/2406.07814

📚 Full reading list and How-To: https://github.com/suzana-ilic/study_model_behavior

Join us for our monthly reading group where we dive into the research and specs that shape how AI systems like ChatGPT and Claude actually behave. We read together for 30 minutes, then discuss for 30 minutes. Pre-reading is recommended, but not required.

Who is this for?
Anyone curious about how AI systems work—researchers, builders, policy folks, or just thoughtful people who use these tools and want to understand them better. No technical background needed. We start with accessible industry standards and papers and build from there.

What will we read?
We're working through resources in six areas:

Industry Specs — How leading AI companies define model behavior
Constitutional AI — Training models with principles instead of human feedback
Safety Methods — RLHF and alignment techniques
Behavioral Science — How researchers study what AI actually does
Interpretability — Understanding what's happening inside the models
Critical Perspectives — Challenges to current approaches

Format

30 minutes: Read together (with discussion questions)
30 minutes: Talk through key insights and implications
Monthly sessions

Location: [Online]
💻 RSVP for Zoom Link
⚙️ Discord https://discord.gg/CT7nBdYCsY
📬 Updates: https://mltaicommunities.substack.com/

Связанные темы

Artificial Intelligence

Deep Learning

Machine Learning

Natural Language Processing

New Technology

Спонсоры

Corehire AI

You're hiring? Drop your job description, get your shortlist in minutes.

ElevenLabs

AI voice generator & voice agents platform. 5000+ voices, 70+ languages.