Skip to content

MoE inference economics from first priciples

Photo of Andreas Hartel
Hosted By
Andreas H.
MoE inference economics from first priciples

Details

Piotr (Aleph Alpha) will talk about MoE inference economics from first priciples.

The release of Kimi K2 mixture-of-expert (MoE) models has firmly established them as the leading architecture of large language models (LLMs) at the intelligence frontier. Due to their massive size (+1 trillion parameters) and sparse computation pattern, selectively activating parameter subsets rather than the entire model for each token, MoE-style LLMs present significant challenges for inference workloads, significantly altering the underlying inference economics. With the ever-growing consumer demand for AI models, as well as the internal need of AGI companies to generate trillions of tokens of synthetic data, the "cost per token" is becoming an even more important factor, determining the profit margins and the cost of capex required for internal reinforcment learning (RL) training rollouts.
In this talk we will go through the details of the cost structure of generating a "DeepSeek token," we will discuss the tradeoffs between latency/throughput and cost, and we will try to estimate the optimal setup to run it.

If you want to join this event, please sign up on our Luma page: https://lu.ma/2ae8czbn
​⚠️ Registration is free, but required due to building security.

🔈 Speakers:

Agenda:

18:30 Doors open: time for networking with fellow attendees
19:00 Talk and Q&A
20:00 Mingling and networking with pizza and drinks
21:00 Meetup ends

- Where: In person, Aleph Alpha Berlin, Ritterstraße 6
- When: Wednesday, August 20th
- Language: English

Photo of Aleph Alpha AI meetup Berlin group
Aleph Alpha AI meetup Berlin
See more events
Aleph Alpha Berlin
Ritterstraße 6 · Berlin
Google map of the user's next upcoming event's location
FREE
1 spot left