🌐
arXiv
arxiv.org › abs › 2501.12948
[2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
January 22, 2025 - View a PDF of the paper titled DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, by DeepSeek-AI and 199 other authors View PDF HTML (experimental) Abstract:We introduce our first-generation reasoning models, ...
🌐
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1-Zero
deepseek-ai/DeepSeek-R1-Zero · Hugging Face
DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities ...
🌐
AI Papers Academy
aipapersacademy.com › home › deepseek-r1 paper explained – a new rl llms era in ai?
DeepSeek-R1 Paper Explained – A New RL LLMs Era in AI? - AI Papers Academy
July 3, 2025 - Training DeepSeek-R1-Zero using only RL in post-training, without SFT · The paper we’re reviewing today eliminates, or partially eliminates, the supervised fine-tuning stage. Specifically, to train DeepSeek-R1-Zero, the first model presented in the paper, we start with a pretrained model called DeepSeek-V3-Base, which has 671 billion parameters.
🌐
ARC Prize
arcprize.org › blog › r1-zero-r1-results-analysis
An Analysis of DeepSeek's R1-Zero and R1
DeepSeeks’s own reported benchmark scores also show strong agreement between R1-Zero and R1, eg. on MATH AIME 2024 scores are 71% and 76% respectively (up from ~40% on the base DeepSeek V3). In the paper, R1-Zero authors say “DeepSeek-R1-Zero encounters challenges such as poor readability, ...
🌐
GitHub
github.com › deepseek-ai › DeepSeek-R1
GitHub - deepseek-ai/DeepSeek-R1
DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities ...
Starred by 91.6K users
Forked by 11.8K users
🌐
arXiv
arxiv.org › pdf › 2501.12948 pdf
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
R1-Zero on the AIME 2024 benchmark throughout the RL training process. As illustrated, DeepSeek-R1-Zero demonstrates a steady and consistent enhancement in performance as the
🌐
Hugging Face
huggingface.co › papers › 2501.12948
Paper page - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1-Zero and DeepSeek-R1 utilize reinforcement learning and multi-stage training to enhance reasoning capabilities, with DeepSeek-R1 achieving performance comparable to OpenAI-o1-1217.
🌐
Nature
nature.com › articles › article
DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning | Nature
September 17, 2025 - This design choice originates from our hypothesis that human-defined reasoning patterns may limit model exploration, whereas unrestricted RL training can better incentivize the emergence of new reasoning capabilities in LLMs. Through this process, detailed in the next section, our model (referred to as DeepSeek-R1-Zero) naturally developed diverse and sophisticated reasoning behaviours.
🌐
BentoML
bentoml.com › blog › the-complete-guide-to-deepseek-models-from-v3-to-r1-and-beyond
The Complete Guide to DeepSeek Models: V3, R1, V3.1, V3.2 and Beyond
The result was R1, a model that not only keeps the reasoning power of R1-Zero but significantly improves accuracy, readability, and coherence. Unlike V3, which is optimized for general tasks, R1 is a true reasoning model. That means it doesn’t just give you an answer; it explains how it got there. Before responding, R1 generates a step-by-step chain of thought, making it especially useful for: ... According to the DeepSeek-R1 paper re-published in Nature and its supplementary information, R1’s training cost was the equivalent of just US$294K primarily on NVIDIA H800 chips.
Find elsewhere
🌐
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1
deepseek-ai/DeepSeek-R1 · Hugging Face
DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities ...
🌐
Medium
medium.com › data-science-in-your-pocket › understanding-deepseek-r1-paper-beginners-guide-e86f83fda796
Understanding DeepSeek-R1 paper: Beginner’s guide | by Mehul Gupta | Data Science in Your Pocket | Medium
January 31, 2025 - The paper explores a new way to improve reasoning using pure reinforcement learning (RL) — meaning no supervised data (human-labeled examples). Instead, the model learns by itself through an RL framework called GRPO (we will discuss this in ...
🌐
Reddit
reddit.com › r/localllama › deepseek r1 / r1 zero
r/LocalLLaMA on Reddit: Deepseek R1 / R1 Zero
January 20, 2025 - But it is less relevant now because ... this thread Continue this thread Continue this thread Continue this thread ... The DeepSeek R1 paper is out....
🌐
Hugging Face
huggingface.co › unsloth › DeepSeek-R1-Zero-GGUF
unsloth/DeepSeek-R1-Zero-GGUF · Hugging Face
DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities ...
🌐
GitHub
github.com › deepseek-ai › DeepSeek-R1 › blob › main › DeepSeek_R1.pdf
DeepSeek-R1/DeepSeek_R1.pdf at main · deepseek-ai/DeepSeek-R1
deepseek-ai / DeepSeek-R1 Public · Notifications · You must be signed in to change notification settings · Fork 11.8k · Star 91.6k ·
Author   deepseek-ai
🌐
Prompt Hub
prompthub.us › blog › deepseek-r-1-model-overview-and-how-it-ranks-against-openais-o1
PromptHub Blog: DeepSeek R-1 Model Overview and How it Ranks Against OpenAI's o1
Aside from creating 2 highly performant models that are on par with OpenAI’s o1 model, the paper has a lot of valuable information around reinforcement learning, chain of thought reasoning, prompt engineering with reasoning models, and more. We’ll start by focusing on the training process of DeepSeek-R1-Zero, which uniquely relied solely on reinforcement learning, instead of traditional supervised learning.
🌐
Hacker News
news.ycombinator.com › item
An analysis of DeepSeek's R1-Zero and R1 | Hacker News
February 12, 2025 - While I think this is an interesting hypothesis, I'm skeptical. You might be lowering the cost of your training corpus by a few million dollars, but I highly doubt you are getting novel, high quality data · We are currently in a world where SOTA base model seems to be capped at around GPT4o levels.
🌐
Ponder
ponder.ing › researches › deepseek-r1-paper-explained
DeepSeek R1 Paper Explained: What is it and How does it work? - Ponder
DeepSeek R1 represents a significant ... of two main variants: DeepSeek-R1-Zero: A pioneering model trained purely through reinforcement learning, without any supervised fine-tuning...
🌐
Interconnects
interconnects.ai › p › deepseek-r1-recipe-for-o1
DeepSeek R1's recipe to replicate o1 and the future of reasoning LMs
January 21, 2025 - In order to improve the readability ... reasoning model, DeepSeek performs a small amount of supervised finetuning on the original base model with “a few thousand” filtered completions from the R1-Zero ......
🌐
Epoch AI
epoch.ai › gradient-updates › what-went-into-training-deepseek-r1
What went into training DeepSeek-R1? | Epoch AI
January 31, 2025 - This paper is less detailed than the v3 technical report, but still contains enough information for us to infer how much computation must have been required for the reinforcement learning phase. Here is what the core reasoning-based reinforcement learning loop of DeepSeek-R1 (the one that produces the checkpoint Deepseek-R1-Zero...
🌐
arXiv
arxiv.org › pdf › 2503.20783 pdf
Understanding R1-Zero-Like Training: A Critical Perspective
Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in