🌐
arXiv
arxiv.org › abs › 2501.12948
[2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
January 22, 2025 - View a PDF of the paper titled DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, by DeepSeek-AI and 199 other authors View PDF HTML (experimental) Abstract:We introduce our first-generation reasoning models, ...
🌐
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1-Zero
deepseek-ai/DeepSeek-R1-Zero · Hugging Face
DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities ...
🌐
AI Papers Academy
aipapersacademy.com › home › deepseek-r1 paper explained – a new rl llms era in ai?
DeepSeek-R1 Paper Explained – A New RL LLMs Era in AI? - AI Papers Academy
July 3, 2025 - Training DeepSeek-R1-Zero using only RL in post-training, without SFT · The paper we’re reviewing today eliminates, or partially eliminates, the supervised fine-tuning stage. Specifically, to train DeepSeek-R1-Zero, the first model presented in the paper, we start with a pretrained model called DeepSeek-V3-Base, which has 671 billion parameters.
🌐
Nature
nature.com › articles › article
DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning | Nature
September 17, 2025 - This design choice originates from our hypothesis that human-defined reasoning patterns may limit model exploration, whereas unrestricted RL training can better incentivize the emergence of new reasoning capabilities in LLMs. Through this process, detailed in the next section, our model (referred to as DeepSeek-R1-Zero) naturally developed diverse and sophisticated reasoning behaviours.
🌐
ARC Prize
arcprize.org › blog › r1-zero-r1-results-analysis
An Analysis of DeepSeek's R1-Zero and R1
DeepSeeks’s own reported benchmark scores also show strong agreement between R1-Zero and R1, eg. on MATH AIME 2024 scores are 71% and 76% respectively (up from ~40% on the base DeepSeek V3). In the paper, R1-Zero authors say “DeepSeek-R1-Zero encounters challenges such as poor readability, ...
🌐
GitHub
github.com › deepseek-ai › DeepSeek-R1
GitHub - deepseek-ai/DeepSeek-R1
DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities ...
Starred by 91.6K users
Forked by 11.8K users
🌐
arXiv
arxiv.org › pdf › 2501.12948 pdf
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
R1-Zero on the AIME 2024 benchmark throughout the RL training process. As illustrated, DeepSeek-R1-Zero demonstrates a steady and consistent enhancement in performance as the
🌐
Hugging Face
huggingface.co › papers › 2501.12948
Paper page - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1-Zero and DeepSeek-R1 utilize reinforcement learning and multi-stage training to enhance reasoning capabilities, with DeepSeek-R1 achieving performance comparable to OpenAI-o1-1217.
🌐
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1
deepseek-ai/DeepSeek-R1 · Hugging Face
DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities ...
Find elsewhere
🌐
BentoML
bentoml.com › blog › the-complete-guide-to-deepseek-models-from-v3-to-r1-and-beyond
The Complete Guide to DeepSeek Models: V3, R1, V3.1, V3.2 and Beyond
The result was R1, a model that not only keeps the reasoning power of R1-Zero but significantly improves accuracy, readability, and coherence. Unlike V3, which is optimized for general tasks, R1 is a true reasoning model. That means it doesn’t just give you an answer; it explains how it got there. Before responding, R1 generates a step-by-step chain of thought, making it especially useful for: ... According to the DeepSeek-R1 paper re-published in Nature and its supplementary information, R1’s training cost was the equivalent of just US$294K primarily on NVIDIA H800 chips.
🌐
Medium
medium.com › data-science-in-your-pocket › understanding-deepseek-r1-paper-beginners-guide-e86f83fda796
Understanding DeepSeek-R1 paper: Beginner’s guide | by Mehul Gupta | Data Science in Your Pocket | Medium
January 31, 2025 - The paper explores a new way to improve reasoning using pure reinforcement learning (RL) — meaning no supervised data (human-labeled examples). Instead, the model learns by itself through an RL framework called GRPO (we will discuss this in ...
🌐
Reddit
reddit.com › r/localllama › deepseek r1 / r1 zero
r/LocalLLaMA on Reddit: Deepseek R1 / R1 Zero
January 20, 2025 - But it is less relevant now because ... this thread Continue this thread Continue this thread Continue this thread ... The DeepSeek R1 paper is out....
🌐
Hugging Face
huggingface.co › unsloth › DeepSeek-R1-Zero-GGUF
unsloth/DeepSeek-R1-Zero-GGUF · Hugging Face
DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities ...
🌐
Prompt Hub
prompthub.us › blog › deepseek-r-1-model-overview-and-how-it-ranks-against-openais-o1
PromptHub Blog: DeepSeek R-1 Model Overview and How it Ranks Against OpenAI's o1
Aside from creating 2 highly performant models that are on par with OpenAI’s o1 model, the paper has a lot of valuable information around reinforcement learning, chain of thought reasoning, prompt engineering with reasoning models, and more. We’ll start by focusing on the training process of DeepSeek-R1-Zero, which uniquely relied solely on reinforcement learning, instead of traditional supervised learning.
🌐
Ponder
ponder.ing › researches › deepseek-r1-paper-explained
DeepSeek R1 Paper Explained: What is it and How does it work? - Ponder
DeepSeek R1 represents a significant ... of two main variants: DeepSeek-R1-Zero: A pioneering model trained purely through reinforcement learning, without any supervised fine-tuning...
🌐
Towards Data Science
towardsdatascience.com › home › latest › how to train llms to “think” (o1 & deepseek-r1)
How to Train LLMs to “Think” (o1 & DeepSeek-R1) | Towards Data Science
March 4, 2025 - In January 2025, DeepSeek published “DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning” [2]. While this paper caused its fair share of pandemonium, its central contribution was unveiling the secrets behind o1.
🌐
GitHub
github.com › deepseek-ai › DeepSeek-R1 › blob › main › DeepSeek_R1.pdf
DeepSeek-R1/DeepSeek_R1.pdf at main · deepseek-ai/DeepSeek-R1
deepseek-ai / DeepSeek-R1 Public · Notifications · You must be signed in to change notification settings · Fork 11.8k · Star 91.6k ·
Author   deepseek-ai
🌐
Hacker News
news.ycombinator.com › item
An analysis of DeepSeek's R1-Zero and R1 | Hacker News
February 12, 2025 - While I think this is an interesting hypothesis, I'm skeptical. You might be lowering the cost of your training corpus by a few million dollars, but I highly doubt you are getting novel, high quality data · We are currently in a world where SOTA base model seems to be capped at around GPT4o levels.
🌐
Interconnects
interconnects.ai › p › deepseek-r1-recipe-for-o1
DeepSeek R1's recipe to replicate o1 and the future of reasoning LMs
January 21, 2025 - In order to improve the readability ... reasoning model, DeepSeek performs a small amount of supervised finetuning on the original base model with “a few thousand” filtered completions from the R1-Zero ......
🌐
ResearchGate
researchgate.net › publication › 388484582_Technical_Report_Analyzing_DeepSeek-R1's_Impact_on_AI_Development
(PDF) Technical Report: Analyzing DeepSeek-R1's Impact on AI Development
January 20, 2025 - The paper "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" introduces DeepSeek-R1-Zero and DeepSeek-R1, two models designed to enhance reasoning capabilities through reinforcement learning (RL).