deepseek-r1 paper - Brave Search

arxiv.org › abs › 2501.12948

[2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

January 22, 2025 - View a PDF of the paper titled DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, by DeepSeek-AI and 199 other authors View PDF HTML (experimental)

arxiv.org › pdf › 2501.12948 pdf

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via

In this paper, we take the first step toward improving language model reasoning capabilities · using pure reinforcement learning (RL). Our goal is to explore the potential of LLMs to develop · reasoning capabilities without any supervised data, focusing on their self-evolution through · a pure RL process. Specifically, we use DeepSeek-V3-Base as the base model and employ

Videos

How to Train LLMs to "Think" (o1 & DeepSeek-R1) - YouTube

February 17, 2025

Deep Dive into DeepSeek-R1 (Presentation + Paper Review) - YouTube

February 12, 2025

DeepSeek R1 Theory Overview | GRPO + RL + SFT - YouTube

January 31, 2025

DeepSeek R1 Paper Explained to your grandma🤶 - YouTube

January 30, 2025

A Slightly Technical Breakdown of DeepSeek-R1 - YouTube

January 29, 2025

DeepSeek R1 Incentivizing Reasoning Capability in LLMs via ...

January 28, 2025

huggingface.co › deepseek-ai › DeepSeek-R1

deepseek-ai/DeepSeek-R1 · Hugging Face

DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities ...

github.com › deepseek-ai › DeepSeek-R1 › blob › main › DeepSeek_R1.pdf

DeepSeek-R1/DeepSeek_R1.pdf at main · deepseek-ai/DeepSeek-R1

deepseek-ai / DeepSeek-R1 Public · Notifications · You must be signed in to change notification settings · Fork 11.8k · Star 91.6k ·

Author deepseek-ai

AI Papers Academy

aipapersacademy.com › home › deepseek-r1 paper explained – a new rl llms era in ai?

DeepSeek-R1 Paper Explained – A New RL LLMs Era in AI? - AI Papers Academy

July 3, 2025 - The paper, titled “DeepSeek-R1: Incentivizing Reasoning Capability in Large Language Models via Reinforcement Learning”, presents a state-of-the-art, open-source reasoning model and a detailed recipe for training such models using large-scale ...

huggingface.co › papers › 2501.12948

Paper page - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

arXiv explained breakdown of this paper 👉 https://arxivexplained.com/papers/deepseek-r1-incentivizing-reasoning-capability-in-llms-via-reinforcement-learning

nature.com › articles › article

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning | Nature

September 17, 2025 - A new artificial intelligence model, DeepSeek-R1, is introduced, demonstrating that the reasoning abilities of large language models can be incentivized through pure reinforcement learning, removing the need for human-annotated demonstrations.

github.com › deepseek-ai › DeepSeek-R1

GitHub - deepseek-ai/DeepSeek-R1

DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.

Starred by 91.6K users

Forked by 11.8K users

medium.com › data-science-in-your-pocket › understanding-deepseek-r1-paper-beginners-guide-e86f83fda796

Understanding DeepSeek-R1 paper: Beginner’s guide | by Mehul Gupta | Data Science in Your Pocket | Medium

January 31, 2025 - The paper explores a new way to improve reasoning using pure reinforcement learning (RL) — meaning no supervised data (human-labeled examples). Instead, the model learns by itself through an RL framework called GRPO (we will discuss this in ...

Find elsewhere

Google Bing Mojeek

api-docs.deepseek.com › deepseek-r1 release 2025/01/20

DeepSeek-R1 Release | DeepSeek API Docs

🛠️ DeepSeek-R1: Technical Highlights · 📈 Large-scale RL in post-training · 🏆 Significant performance boost with minimal labeled data · 🔢 Math, code, and reasoning tasks on par with OpenAI-o1 · 📄 More details: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf ·

seangoedecke.com › deepseek-r1

What did DeepSeek figure out about reasoning with DeepSeek-R1?

The Chinese AI lab1 DeepSeek recently released their new reasoning model R1, which is supposedly (a) better than the current best reasoning models (OpenAI’s o1- series), and (b) was trained on a GPU cluster a fraction the size of any of the big western AI labs. Unlike the big western AI labs, they’ve released a paper ...

reddit.com › r/singularity › a summary of deepseek-r1's paper by deepseek-r1

r/singularity on Reddit: A summary of DeepSeek-R1's paper by DeepSeek-R1

December 2, 2024 -

Aha moments emerged naturally in RL: Self-correction behaviors like "Wait, let’s reevaluate..." arose without SFT.
Cold-start SFT fixed readability: ~1k structured examples resolved language mixing.
GRPO cut RL costs by 30%: Group-wise reward normalization outperformed PPO.
RL increased CoT length autonomously: Reasoning steps grew from 100→1k tokens without penalties.
Distillation beat direct RL in small models: SFT on R1 data outperformed RL-trained base models.
Process rewards failed; outcome rewards worked better: Rule-based final-answer checks stabilized training.
XML tags reduced hallucinations 15%: Structured <think>/<answer> improved reward clarity.
Language mixing fixed via consistency rewards: Penalized code-switching in multilingual outputs.

I find it funny that ive seen multiple AI youtubers explain papers and they just go to another AI to help them in the video but hey it does a good job

https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf

DeepSeek-R1 is roughly 4.41x cheaper than o1 which means for the same price as a singular o1 query give or take you could run a consensus voting tree-of-agents system with 4 separate instances of R1 which could outperform o1 for the same price if not still cheaper

The new benchmarks for R1 zero are INSANE!!!

medium.com › @mayadakhatib › deepseek-r1-a-short-summary-73b6b8ced9cf

DeepSeek R1 — a short summary

January 25, 2025 - The DeepSeek R1 model stands out for multiple reasons: It’s a free, open source SOTA reasoning model that is trained using direct Reinforcement Learning without supervised finetuning.

huggingface.co › learn › llm-course › en › chapter12 › 3

Understanding the DeepSeek R1 Paper - Hugging Face LLM Course

The DeepSeek R1 paper represents a significant milestone in language model development.

reddit.com › r/llmdevs › how was deepseek-r1 built; for dummies

r/LLMDevs on Reddit: How was DeepSeek-R1 built; For dummies

January 27, 2025 -

Over the weekend I wanted to learn how was DeepSeek-R1 trained, and what was so revolutionary about it. So I ended up reading the paper, and wrote down my thoughts. < the article linked is (hopefully) written in a way that it's easier for everyone to understand it -- no PhD required!

Here's a "quick" summary:

1/ DeepSeek-R1-Zero is trained with pure-reinforcement learning (RL), without using labeled data. It's the first time someone tried and succeeded doing that. (that we know of, o1 report didn't show much)

2/ Traditional RL frameworks (like PPO) have something like an 'LLM coach or critic' that tells the model whether the answer was good or bad -- based on given examples (labeled data). DeepSeek uses GRPO, a pure-RL framework that skips the critic and calculates the group average of LLM answers based on predefined rules

3/ But, how can you evaluate the performance if you don't have labeled data to test against it? With this framework, the rules aren't perfect—they’re just a best guess at what "good" looks like. The RL process tries to optimize on things like:

Does the answer make sense? (Coherence)

Is it in the right format? (Completeness)

Does it match the general style we expect? (Fluency)

For example, for the DeepSeek-R1-Zero model, for mathematical tasks, the model could be rewarded for producing outputs that align to mathematical principles or logical consistency.

It makes sense.. and it works... to some extent!

4/ This model (R1-Zero) had issues with poor readability and language mixing -- something that you'd get from using pure-RL. So, the authors wanted to go through a multi-stage training process and do something that feels like hacking various training methods:

5/ What you see above is the DeepSeek-R1 model that goes through a list of training methods for different purposes

(i) the cold start data lays a structured foundation fixing issues like poor readability
(ii) pure-RL develops reasoning almost on auto-pilot
(iii) rejection sampling + SFT works with top-tier training data that improves accuracy, and
(iv) another final RL stage ensures additional level of generalization.

And with that they're doing as good as or better than o1 models.

Lmk if you have any questions (i might be able to answer them).

Just wrote about it, it's absolutely great, and the less is more will definitely redefine AI as we know it

Work in progress https://github.com/huggingface/open-r1

x.com › omarsar0 › status › 1881479496466927632

elvis on X: "The DeepSeek-R1 paper is a gem! Highly encourage everyone to read it. It's clear that LLM reasoning capabilities can be learned in different ways. RL, if applied correctly and at scale, can lead to some really powerful and interesting scaling and emergent properties. There https://t.co/egcmnWyBqp" / X

Here is my breakdown of the paper along with a few tests: https://youtu.be/3GlFd3doO3U?si=SVOCGhhMSY2xqR_2… The multi-state training might not make sense initially but they provide clues on optimizations that we can continue to tap into. Data quality is still very important for enhancing the usability of the LLM. Unlike other reasoning LLMs, DeepSeek-R1's training recipe and weights are open so we can build on top of it.

interconnects.ai › p › deepseek-r1-recipe-for-o1

DeepSeek R1's recipe to replicate o1 and the future of reasoning LMs

January 21, 2025 - The DeepSeek R1 report has an entire other subsection dedicated to its distillation experiments, where it took completions from the R1 model and finetuned existing open-weight models with them to boost performance. This is a fantastic service for them to release this and provides a solid baseline for RL experiments on smaller models to try and match in the near future. The discussion in the paper on how large models are required to see the biggest reasoning gains (and generate effective synthetic data) is likely the biggest open question:

ponder.ing › researches › deepseek-r1-paper-explained

DeepSeek R1 Paper Explained: What is it and How does it work? - Ponder

The paper introduces DeepSeek R1, a large language model trained on a massive dataset with up to 8K context length.

arxiv.org › abs › 2502.02523

[2502.02523] Brief analysis of DeepSeek R1 and its implications for Generative AI

February 7, 2025 - View a PDF of the paper titled Brief analysis of DeepSeek R1 and its implications for Generative AI, by Sarah Mercer and 2 other authors View PDF HTML (experimental)