Skip to content

DeepSeek V3 Paper Review

Photo of Rakshak Talwar
Hosted By
Rakshak T.
DeepSeek V3 Paper Review

Details

This will be a journal club event

# DeepSeek-V3 Technical Report

Link to Paper

Speaker
Joe Fioti

Abstract
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks.

Info
Austin Deep Learning Journal Club is group for committed machine learning practitioners and researchers alike. The group typically meets every first Tuesday of each month to discuss research publications. The publications are usually the ones that laid foundation to ML/DL or explore novel promising ideas and are selected by a vote. Participants are expected to read the publications to be able to contribute to discussion and learn from others. This is also a great opportunity to showcase your implementations to get feedback from other experts.

Sponsors:
Thank you to Capital Factory for sponsoring Austin Deep Learning. Capital Factory is the center of gravity for entrepreneurs in Texas. They meet the best entrepreneurs in Texas and introduce them to their first investors,
employees, mentors, and customers. To sign up for a Capital Factory
membership, click here.

Antler: Meet your co-founders, join our global community, and access capital to build and scale your company faster.

Photo of Austin Deep Learning group
Austin Deep Learning
See more events

Every 1st Tuesday of the month until December 2, 2025

Capital Factory, Voltron Room (1st floor)
701 Brazos Street · Austin, TX