MoE inference economics from first priciples


Details
Piotr (Aleph Alpha) will talk about MoE inference economics from first priciples.
The release of Kimi K2 mixture-of-expert (MoE) models has firmly established them as the leading architecture of large language models (LLMs) at the intelligence frontier. Due to their massive size (+1 trillion parameters) and sparse computation pattern, selectively activating parameter subsets rather than the entire model for each token, MoE-style LLMs present significant challenges for inference workloads, significantly altering the underlying inference economics. With the ever-growing consumer demand for AI models, as well as the internal need of AGI companies to generate trillions of tokens of synthetic data, the "cost per token" is becoming an even more important factor, determining the profit margins and the cost of capex required for internal reinforcment learning (RL) training rollouts.
In this talk we will go through the details of the cost structure of generating a "DeepSeek token," we will discuss the tradeoffs between latency/throughput and cost, and we will try to estimate the optimal setup to run it.
If you want to join this event, please sign up on our Luma page: https://lu.ma/2ae8czbn
⚠️ Registration is free, but required due to building security.
🔈 Speakers:
- Piotr Mazurek (https://x.com/tugot17), Senior AI Inference Engineer
Agenda:
✨ 18:30 Doors open: time for networking with fellow attendees
✨ 19:00 Talk and Q&A
✨ 20:00 Mingling and networking with pizza and drinks
✨ 21:00 Meetup ends
- Where: In person, Aleph Alpha Berlin, Ritterstraße 6
- When: Wednesday, August 20th
- Language: English

MoE inference economics from first priciples