DeepSeek-R1: Let us understand it in depth
Details
We will walk through the paper:
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
[2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Abstract
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without super vised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks. To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.
----
We are a group of applied AI practitioners and enthusiasts who have formed a collective learning community. Every Wednesday evening, we hold our research paper reading seminar covering an AI topic. One member carefully explains the paper, making it more accessible to a broader audience. Then, we follow this reading with a more informal discussion and socializing.
You are welcome to join this in person or over Zoom. SupportVectors is an AI training lab located in Fremont, CA, close to Tesla and easily accessible by road and BART. We follow the weekly sessions with snacks, soft drinks, and informal discussions.
If you want to attend by Zoom, the Zoom registration link will be visible once you RSVP. Note that we have had to change and add security to the Zoom link to prevent Zoom bombing.
