deepseek-r1-zero paper - Brave Search

arxiv.org › abs › 2501.12948

[2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

January 22, 2025 - View a PDF of the paper titled DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, by DeepSeek-AI and 199 other authors View PDF HTML (experimental) Abstract:We introduce our first-generation reasoning models, ...

huggingface.co › deepseek-ai › DeepSeek-R1-Zero

deepseek-ai/DeepSeek-R1-Zero · Hugging Face

DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities ...

Videos

ArrrZero: Why DeepSeek R1 is less important than R1-Zero - YouTube

January 31, 2025

DeepSeek-R1 vs DeepSeek-R1-Zero - YouTube

January 20, 2025

DeepSeek R1 Theory Overview | GRPO + RL + SFT - YouTube

January 31, 2025

DeepSeek R1 Paper Explained to your grandma🤶 - YouTube

January 30, 2025

DeepSeek R1 Explained by AI Expert: How R1-Zero Led to an AI ...

Mike Knoop on the difference between R1 and R1 zero - YouTube

February 7, 2025

AI Papers Academy

aipapersacademy.com › home › deepseek-r1 paper explained – a new rl llms era in ai?

DeepSeek-R1 Paper Explained – A New RL LLMs Era in AI? - AI Papers Academy

July 3, 2025 - Training DeepSeek-R1-Zero using only RL in post-training, without SFT · The paper we’re reviewing today eliminates, or partially eliminates, the supervised fine-tuning stage. Specifically, to train DeepSeek-R1-Zero, the first model presented in the paper, we start with a pretrained model called DeepSeek-V3-Base, which has 671 billion parameters.

arcprize.org › blog › r1-zero-r1-results-analysis

An Analysis of DeepSeek's R1-Zero and R1

DeepSeeks’s own reported benchmark scores also show strong agreement between R1-Zero and R1, eg. on MATH AIME 2024 scores are 71% and 76% respectively (up from ~40% on the base DeepSeek V3). In the paper, R1-Zero authors say “DeepSeek-R1-Zero encounters challenges such as poor readability, ...

github.com › deepseek-ai › DeepSeek-R1

GitHub - deepseek-ai/DeepSeek-R1

DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities ...

Starred by 91.6K users

Forked by 11.8K users

arxiv.org › pdf › 2501.12948 pdf

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via

R1-Zero on the AIME 2024 benchmark throughout the RL training process. As illustrated, DeepSeek-R1-Zero demonstrates a steady and consistent enhancement in performance as the

huggingface.co › papers › 2501.12948

Paper page - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-R1-Zero and DeepSeek-R1 utilize reinforcement learning and multi-stage training to enhance reasoning capabilities, with DeepSeek-R1 achieving performance comparable to OpenAI-o1-1217.

nature.com › articles › article

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning | Nature

September 17, 2025 - This design choice originates from our hypothesis that human-defined reasoning patterns may limit model exploration, whereas unrestricted RL training can better incentivize the emergence of new reasoning capabilities in LLMs. Through this process, detailed in the next section, our model (referred to as DeepSeek-R1-Zero) naturally developed diverse and sophisticated reasoning behaviours.

bentoml.com › blog › the-complete-guide-to-deepseek-models-from-v3-to-r1-and-beyond

The Complete Guide to DeepSeek Models: V3, R1, V3.1, V3.2 and Beyond

The result was R1, a model that not only keeps the reasoning power of R1-Zero but significantly improves accuracy, readability, and coherence. Unlike V3, which is optimized for general tasks, R1 is a true reasoning model. That means it doesn’t just give you an answer; it explains how it got there. Before responding, R1 generates a step-by-step chain of thought, making it especially useful for: ... According to the DeepSeek-R1 paper re-published in Nature and its supplementary information, R1’s training cost was the equivalent of just US$294K primarily on NVIDIA H800 chips.

Find elsewhere

Google Bing Mojeek

huggingface.co › deepseek-ai › DeepSeek-R1

deepseek-ai/DeepSeek-R1 · Hugging Face

DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities ...

medium.com › data-science-in-your-pocket › understanding-deepseek-r1-paper-beginners-guide-e86f83fda796

Understanding DeepSeek-R1 paper: Beginner’s guide | by Mehul Gupta | Data Science in Your Pocket | Medium

January 31, 2025 - The paper explores a new way to improve reasoning using pure reinforcement learning (RL) — meaning no supervised data (human-labeled examples). Instead, the model learns by itself through an RL framework called GRPO (we will discuss this in ...

reddit.com › r/localllama › deepseek r1 / r1 zero

r/LocalLLaMA on Reddit: Deepseek R1 / R1 Zero

January 20, 2025 - But it is less relevant now because ... this thread Continue this thread Continue this thread Continue this thread ... The DeepSeek R1 paper is out....

huggingface.co › unsloth › DeepSeek-R1-Zero-GGUF

unsloth/DeepSeek-R1-Zero-GGUF · Hugging Face

DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities ...

github.com › deepseek-ai › DeepSeek-R1 › blob › main › DeepSeek_R1.pdf

DeepSeek-R1/DeepSeek_R1.pdf at main · deepseek-ai/DeepSeek-R1

deepseek-ai / DeepSeek-R1 Public · Notifications · You must be signed in to change notification settings · Fork 11.8k · Star 91.6k ·

Author deepseek-ai

prompthub.us › blog › deepseek-r-1-model-overview-and-how-it-ranks-against-openais-o1

PromptHub Blog: DeepSeek R-1 Model Overview and How it Ranks Against OpenAI's o1

Aside from creating 2 highly performant models that are on par with OpenAI’s o1 model, the paper has a lot of valuable information around reinforcement learning, chain of thought reasoning, prompt engineering with reasoning models, and more. We’ll start by focusing on the training process of DeepSeek-R1-Zero, which uniquely relied solely on reinforcement learning, instead of traditional supervised learning.

news.ycombinator.com › item

An analysis of DeepSeek's R1-Zero and R1 | Hacker News

February 12, 2025 - While I think this is an interesting hypothesis, I'm skeptical. You might be lowering the cost of your training corpus by a few million dollars, but I highly doubt you are getting novel, high quality data · We are currently in a world where SOTA base model seems to be capped at around GPT4o levels.

ponder.ing › researches › deepseek-r1-paper-explained

DeepSeek R1 Paper Explained: What is it and How does it work? - Ponder

DeepSeek R1 represents a significant ... of two main variants: DeepSeek-R1-Zero: A pioneering model trained purely through reinforcement learning, without any supervised fine-tuning...

interconnects.ai › p › deepseek-r1-recipe-for-o1

DeepSeek R1's recipe to replicate o1 and the future of reasoning LMs

January 21, 2025 - In order to improve the readability ... reasoning model, DeepSeek performs a small amount of supervised finetuning on the original base model with “a few thousand” filtered completions from the R1-Zero ......

epoch.ai › gradient-updates › what-went-into-training-deepseek-r1

What went into training DeepSeek-R1? | Epoch AI

January 31, 2025 - This paper is less detailed than the v3 technical report, but still contains enough information for us to infer how much computation must have been required for the reinforcement learning phase. Here is what the core reasoning-based reinforcement learning loop of DeepSeek-R1 (the one that produces the checkpoint Deepseek-R1-Zero...

arxiv.org › pdf › 2503.20783 pdf

Understanding R1-Zero-Like Training: A Critical Perspective

Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in