🌐
arXiv
arxiv.org › abs › 2501.12948
[2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
January 22, 2025 - View a PDF of the paper titled DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, by DeepSeek-AI and 199 other authors View PDF HTML (experimental)
🌐
Nature
nature.com › articles › article
DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning | Nature
September 17, 2025 - A new artificial intelligence model, DeepSeek-R1, is introduced, demonstrating that the reasoning abilities of large language models can be incentivized through pure reinforcement learning, removing the need for human-annotated demonstrations.
Discussions

Notes on Deepseek r1: Just how good it is compared to OpenAI o1
Aside from the LLM model itself, this shown that OpenAI isn't that ahead anymore from others, I mean, OpenAI still has the money and the hype, but 1 year ago, no one could beat them. The game has changed, surely. Of course OpenAI is gonna make moves, but this is a huge W for LLM in general More on reddit.com
🌐 r/LocalLLaMA
486
1266
October 26, 2024
DeepSeek R1 Academic Paper AI Agent
Galactica was an AI model that might help you write papers. It was made for that. Facebook made it. More on reddit.com
🌐 r/DeepSeek
2
4
June 22, 2024
DeepSeek Megathread
I dont remember OpenAI discussion moved to a single thread when that was what everyone talked about. More on reddit.com
🌐 r/ArtificialInteligence
313
300
December 14, 2024
Discussing DeepSeek-R1 research paper in depth
2.2K subscribers in the llmops community. A homebase for LLMOps enthusiasts. Spam will be mocked on Twitter. Be warned. More on reddit.com
🌐 r/llmops
4
May 24, 2024
🌐
Science News
sciencenews.org › article › ai-model-deepseek-answers-training
A look under the hood of DeepSeek’s AI models doesn't provide all the answers
2 weeks ago - As a result, DeepSeek researchers went back and implemented an additional reinforcement learning stage in the training pipeline with a reward for language consistency to prevent the mix-up. Out came DeepSeek-R1, a successor to R1-Zero.
🌐
Hugging Face
huggingface.co › papers › 2501.12948
Paper page - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
January 22, 2025 - arXiv explained breakdown of this paper 👉 https://arxivexplained.com/papers/deepseek-r1-incentivizing-reasoning-capability-in-llms-via-reinforcement-learning
🌐
CNN
cnn.com › 2025 › 09 › 19 › business › deepseek-ai-training-cost-china-intl
China’s DeepSeek shook the tech world. Its developer just revealed the cost of training the AI model | CNN Business
September 27, 2025 - Chinese artificial intelligence developer DeepSeek spent just $294,000 on training its R1 model, much less than reported for US rivals, it said in a paper that is likely to reignite debate over Beijing’s place in the AI race.
🌐
AI Papers Academy
aipapersacademy.com › home › deepseek-r1 paper explained – a new rl llms era in ai?
DeepSeek-R1 Paper Explained – A New RL LLMs Era in AI? - AI Papers Academy
July 3, 2025 - The paper, titled “DeepSeek-R1: Incentivizing Reasoning Capability in Large Language Models via Reinforcement Learning”, presents a state-of-the-art, open-source reasoning model and a detailed recipe for training such models using large-scale ...
🌐
Reuters
reuters.com › world › china › chinas-deepseek-says-its-hit-ai-model-cost-just-294000-train-2025-09-18
China's DeepSeek says its hit AI model cost just $294,000 to train | Reuters
September 22, 2025 - BEIJING, Sept 18 (Reuters) - Chinese ... reported for U.S. rivals, in a paper that is likely to reignite debate over Beijing's place in the race to develop artificial intelligence....
🌐
South China Morning Post
scmp.com › news › china › science
DeepSeek secrets unveiled: engineers reveal science behind China’s viral AI model | South China Morning Post
September 18, 2025 - ... “General reasoning represents a long-standing and formidable challenge in artificial intelligence,” the team said in a paper published in the peer-reviewed journal Nature on Wednesday.
Find elsewhere
🌐
Medium
artgor.medium.com › paper-review-deepseek-r1-incentivizing-reasoning-capability-in-llms-via-reinforcement-learning-edf4343dcf3a
Paper Review: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | by Andrew Lukyanenko | Medium
January 27, 2025 - The distilled models are based on Qwen and Llama architectures. In this paper, the authors demonstrate that the reasoning capabilities of models can be significantly improved through large-scale LR without STF as a cold start.
🌐
YouTube
youtube.com › watch
DeepSeek R1 Explained to your grandma - YouTube
Describing the key insights from the DeepSeek R1 paper in a way even your grandma could understand. I focus on the key concepts of chain of thought reasoning...
Published   January 23, 2025
🌐
Nature
nature.com › news › article
Secrets of DeepSeek AI model revealed in landmark paper
September 17, 2025 - The statement came in documents released alongside a peer-reviewed version of the R1 model, published today in Nature1. ... Read the related News & Views: ‘AI can learn to show its workings through trial and error’. Guo, D. et al. Nature 645, 633–638 (2025). ... DeepSeek-AI et al.
🌐
YouTube
youtube.com › watch
Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning - YouTube
Paper reading in the Discord group. All the lecture was improvised.Join the group: https://discord.gg/JRKsaNbhCgLink to paper: https://github.com/deepseek-ai...
Published   January 21, 2025
🌐
Scientific American
scientificamerican.com › article › secrets-of-chinese-ai-model-deepseek-revealed-in-landmark-paper
Secrets of Chinese AI Model DeepSeek Revealed in Landmark Paper | Scientific American
September 17, 2025 - R1 is designed to excel at ‘reasoning’ tasks such as mathematics and coding, and is a cheaper rival to tools developed by US technology firms. As an ‘open weight’ model, it is available for anyone to download and is the most popular such model on the AI community platform Hugging Face to date, having been downloaded 10.9 million times. The paper updates a preprint released in January, which describes how DeepSeek augmented a standard large language model (LLM) to tackle reasoning tasks.
🌐
Sean Goedecke
seangoedecke.com › deepseek-r1
What did DeepSeek figure out about reasoning with DeepSeek-R1?
January 26, 2025 - The Chinese AI lab1 DeepSeek recently released their new reasoning model R1, which is supposedly (a) better than the current best reasoning models (OpenAI’s o1- series), and (b) was trained on a GPU cluster a fraction the size of any of the big western AI labs. Unlike the big western AI labs, they’ve released a paper ...
🌐
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1
deepseek-ai/DeepSeek-R1 · Hugging Face
1 month ago - DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities ...
🌐
Medium
medium.com › data-science-in-your-pocket › understanding-deepseek-r1-paper-beginners-guide-e86f83fda796
Understanding DeepSeek-R1 paper: Beginner’s guide | by Mehul Gupta | Data Science in Your Pocket | Medium
January 31, 2025 - The paper explores a new way to improve reasoning using pure reinforcement learning (RL) — meaning no supervised data (human-labeled examples). Instead, the model learns by itself through an RL framework called GRPO (we will discuss this in ...
🌐
Reddit
reddit.com › r/localllama › notes on deepseek r1: just how good it is compared to openai o1
r/LocalLLaMA on Reddit: Notes on Deepseek r1: Just how good it is compared to OpenAI o1
October 26, 2024 -

Finally, there is a model worthy of the hype it has been getting since Claude 3.6 Sonnet. Deepseek has released something anyone hardly expected: a reasoning model on par with OpenAI’s o1 within a month of the v3 release, with an MIT license and 1/20th of o1’s cost.

This is easily the best release since GPT-4. It's wild; the general public seems excited about this, while the big AI labs are probably scrambling. It feels like things are about to speed up in the AI world. And it's all thanks to this new DeepSeek-R1 model and how they trained it. 

Some key details from the paper

  • Pure RL (GRPO) on v3-base to get r1-zero. (No Monte-Carlo Tree Search or Process Reward Modelling)

  • The model uses “Aha moments” as pivot tokens to reflect and reevaluate answers during CoT.

  • To overcome r1-zero’s readability issues, v3 was SFTd on cold start data.

  • Distillation works, small models like Qwen and Llama trained over r1 generated data show significant improvements.

Here’s an overall r0 pipeline

  • v3 base + RL (GRPO) → r1-zero

r1 training pipeline.

  1. DeepSeek-V3 Base + SFT (Cold Start Data) → Checkpoint 1

  2. Checkpoint 1 + RL (GRPO + Language Consistency) → Checkpoint 2

  3. Checkpoint 2 used to Generate Data (Rejection Sampling)

  4. DeepSeek-V3 Base + SFT (Generated Data + Other Data) → Checkpoint 3

  5. Checkpoint 3 + RL (Reasoning + Preference Rewards) → DeepSeek-R1

We know the benchmarks, but just how good is it?

Deepseek r1 vs OpenAI o1.

So, for this, I tested r1 and o1 side by side on complex reasoning, math, coding, and creative writing problems. These are the questions that o1 solved only or by none before.

Here’s what I found:

  • For reasoning, it is much better than any previous SOTA model until o1. It is better than o1-preview but a notch below o1. This is also shown in the ARC AGI bench.

  • Mathematics: It's also the same for mathematics; r1 is a killer, but o1 is better.

  • Coding: I didn’t get to play much, but on first look, it’s up there with o1, and the fact that it costs 20x less makes it the practical winner.

  • Writing: This is where R1 takes the lead. It gives the same vibes as early Opus. It’s free, less censored, has much more personality, is easy to steer, and is very creative compared to the rest, even o1-pro.

What interested me was how free the model sounded and thought traces were, akin to human internal monologue. Perhaps this is because of the less stringent RLHF, unlike US models.

The fact that you can get r1 from v3 via pure RL was the most surprising.

For in-depth analysis, commentary, and remarks on the Deepseek r1, check out this blog post: Notes on Deepseek r1

What are your experiences with the new Deepseek r1? Did you find the model useful for your use cases?

🌐
DeepSeek
api-docs.deepseek.com › deepseek-r1 release 2025/01/20
DeepSeek-R1 Release | DeepSeek API Docs
🛠️ DeepSeek-R1: Technical Highlights · 📈 Large-scale RL in post-training · 🏆 Significant performance boost with minimal labeled data · 🔢 Math, code, and reasoning tasks on par with OpenAI-o1 · 📄 More details: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf ·