
What we’re about
The LLM Reading Club is for anyone interested in building both a theoretical intuition of large language models and practical expertise in their development and use. There’s momentum and fun in numbers! 😊 The club will collectively explore some of the canonical books and research papers pertaining to large language models.
The level of prerequisite knowledge needed to maximise benefit will vary depending on the specific book or paper being covered. Generally, the content will be most suitable for those proficient in Python and with some understanding of (or a willingness to quickly learn) the relevant core machine learning or mathematical concepts.
All are welcome - this includes but not limited to - data practitioners of all hues, enthusiasts, students, researchers, and professionals.
Join the convo and connect on Discord :) https://bit.ly/llm-discord
Upcoming events (2)
See all- Research Paper review (OpenAI Whisper)Link visible for attendees
Join us to review OpenAI's groundbreaking paper that revolutionised automatic speech recognition and underpins the Whisper speech-to-text model:
- Robust Speech Recognition via Large-Scale Weak Supervision
We'll aim to dissect the methodology, architecture, and implications of training on multilingual audio data.
Key Points:
- Weak supervision approach - How Whisper learned from imperfect, web-scraped data
- Zero-shot transfer capabilities - Why it works across languages without fine-tuning
- Architecture deep dive - Transformer encoder-decoder design choices
- Scaling insights - What 680k hours of data teaches us about speech AI
- Real-world performance - Robustness to accents, noise, and domain shifts
Please note that the session is not recorded.
Key links:
Discord joining instructions: https://bit.ly/llm-discord
- OpenAI Whisper - From Paper to CodeLink visible for attendees
Following on from the previous week's unpacking of the Whisper paper (Robust Speech Recognition via Large-Scale Weak Supervision), in this session we switch gears and dive into the code.
🔧 What we’ll look at together:
- How the code reflects the encoder–decoder transformer design.
- Data preprocessing — handling noisy, imperfect transcripts.
- Training vs. inference setup.
- Practical adaptations — how to plug Whisper into real-world pipelines (e.g. multilingual transcription, noisy audio).
Please note that the session is not recorded.
Key links:
Discord joining instructions: https://bit.ly/llm-discord