Just wrote about it, it's absolutely great, and the less is more will definitely redefine AI as we know it Answer from Rolandojuve on reddit.com
🌐
Languagemodels
newsletter.languagemodels.co › p › the-illustrated-deepseek-r1
The Illustrated DeepSeek-R1 - by Jay Alammar
January 27, 2025 - Just like most existing LLMs, DeepSeek-R1 generates one token at a time, except it excels at solving math and reasoning problems because it is able to spend more time processing a problem through the process of generating thinking tokens that explain its chain of thought.
🌐
Reddit
reddit.com › r/llmdevs › how was deepseek-r1 built; for dummies
r/LLMDevs on Reddit: How was DeepSeek-R1 built; For dummies
January 27, 2025 -

Over the weekend I wanted to learn how was DeepSeek-R1 trained, and what was so revolutionary about it. So I ended up reading the paper, and wrote down my thoughts. < the article linked is (hopefully) written in a way that it's easier for everyone to understand it -- no PhD required!

Here's a "quick" summary:

1/ DeepSeek-R1-Zero is trained with pure-reinforcement learning (RL), without using labeled data. It's the first time someone tried and succeeded doing that. (that we know of, o1 report didn't show much)

2/ Traditional RL frameworks (like PPO) have something like an 'LLM coach or critic' that tells the model whether the answer was good or bad -- based on given examples (labeled data). DeepSeek uses GRPO, a pure-RL framework that skips the critic and calculates the group average of LLM answers based on predefined rules

3/ But, how can you evaluate the performance if you don't have labeled data to test against it? With this framework, the rules aren't perfect—they’re just a best guess at what "good" looks like. The RL process tries to optimize on things like:

Does the answer make sense? (Coherence)

Is it in the right format? (Completeness)

Does it match the general style we expect? (Fluency)

For example, for the DeepSeek-R1-Zero model, for mathematical tasks, the model could be rewarded for producing outputs that align to mathematical principles or logical consistency.

It makes sense.. and it works... to some extent!

4/ This model (R1-Zero) had issues with poor readability and language mixing -- something that you'd get from using pure-RL. So, the authors wanted to go through a multi-stage training process and do something that feels like hacking various training methods:

5/ What you see above is the DeepSeek-R1 model that goes through a list of training methods for different purposes

(i) the cold start data lays a structured foundation fixing issues like poor readability
(ii) pure-RL develops reasoning almost on auto-pilot
(iii) rejection sampling + SFT works with top-tier training data that improves accuracy, and
(iv) another final RL stage ensures additional level of generalization.

And with that they're doing as good as or better than o1 models.

Lmk if you have any questions (i might be able to answer them).

Discussions

How was DeepSeek-R1 built; For dummies
Just wrote about it, it's absolutely great, and the less is more will definitely redefine AI as we know it More on reddit.com
🌐 r/LLMDevs
59
876
January 27, 2025
Notes on Deepseek r1: Just how good it is compared to o1
Many of the problems o1 are has can just be attributed to the fact they refuse to let it think for long enough More on reddit.com
🌐 r/singularity
40
155
August 3, 2024
What are people using Deepseek R1 for?
from what I see around here the only thing people use it for is to get it to admit tianamen square happened. More on reddit.com
🌐 r/LocalLLaMA
15
0
December 23, 2024
Deepseek R1 Explained by a Retired Microsoft Engineer [10:06]
You know when you sit down for a meal in front of the computer and you just need something new to watch for a bit while you eat? If you search /r/videos or other places, you'll find mostly short videos. But while you're eating, you don't want to be constantly fumbling around with the mouse, ... More on reddit.com
🌐 r/mealtimevideos
2
72
June 4, 2024
People also ask

How to access DeepSeek-R1
DeepSeek’s chatbot (which can be powered by the R1 model) is free to use on the company’s website and is available for download on the Apple App Store. R1 is also available for use on Hugging Face and DeepSeek’s API.
🌐
builtin.com
builtin.com › artificial-intelligence › deepseek-r1
What Is DeepSeek-R1? | Built In
Is DeepSeek-R1 open source?
Yes, DeepSeek is open source in that its model weights and training methods are freely available for the public to examine, use and build upon. However, its source code and any specifics about its underlying data are not available to the public.
🌐
builtin.com
builtin.com › artificial-intelligence › deepseek-r1
What Is DeepSeek-R1? | Built In
How many parameters does DeepSeek-R1 have?
DeepSeek-R1 has 671 billion parameters in total. But DeepSeek also released six “distilled” versions of R1, ranging in size from 1.5 billion parameters to 70 billion parameters. While the smallest can run on a laptop with consumer GPUs, the full R1 requires more substantial hardware.
🌐
builtin.com
builtin.com › artificial-intelligence › deepseek-r1
What Is DeepSeek-R1? | Built In
🌐
Hugging Face
huggingface.co › blog › NormalUhr › deepseek-r1-explained
From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning
February 4, 2025 - DeepSeek-R1-Zero: A model that learns complex reasoning behaviors purely through reinforcement learning without any supervised fine-tuning, showing emergent abilities like extended chain-of-thought, reflection, and self-correction.
🌐
Built In
builtin.com › artificial-intelligence › deepseek-r1
What Is DeepSeek-R1? | Built In
DeepSeek-R1, or R1, is an open-source language model made by Chinese AI startup DeepSeek that can perform the same text-based tasks as other advanced models, but at a lower cost. It also powers the DeepSeek chatbot, a direct competitor to ChatGPT.
Published   October 6, 2025
🌐
Vellum
vellum.ai › blog › the-training-of-deepseek-r1-and-ways-to-use-it
Breaking down the DeepSeek-R1 training process—no PhD required
January 27, 2025 - Step 1: They fine-tuned a base model (DeepSeek-V3-Base) with thousands of cold-start data points to lay a solid foundation. FYI, thousands of cold-start data points is a tiny fraction compared to the millions or even billions of labeled data points typically required for supervised learning at scale. Step 2: Applied pure RL (similar to R1-Zero) to enhance reasoning skills.
Find elsewhere
🌐
Niklas Heidloff
heidloff.net › article › deepseek-r1
Key Concepts of DeepSeek-R1 | Niklas Heidloff
February 17, 2025 - DeepSeek-V3-Base is the base model for DeepSeek-R1. It is a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37b activated for each token. It has been pretrained on 14.8 trillion diverse and high-quality tokens.
🌐
Thelmbook
thelmbook.com › articles
DeepSeek R1 and R1-Zero Explained
This website requires Javascript to be enabled. Please turn on Javascript and reload the page
🌐
Substack
iaee.substack.com › p › deepseek-r1-intuitively-and-exhaustively
DeepSeek-R1 — Intuitively and Exhaustively Explained
February 3, 2025 - Brought to you by the subscribers of Intuitively and Exhaustively Explained. In this article we’ll discuss DeepSeek-R1, the first open-source model that exhibits comparable performance to closed source LLMs, like those produced by Google, OpenAI, and Anthropic.
🌐
Medium
medium.com › data-science-in-your-pocket › understanding-deepseek-r1-paper-beginners-guide-e86f83fda796
Understanding DeepSeek-R1 paper: Beginner’s guide | by Mehul Gupta | Data Science in Your Pocket | Medium
January 31, 2025 - Understanding DeepSeek-R1 paper: Beginner’s guide DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning So, by now you’ve heard of DeepSeek making huge waves in the …
🌐
AI Papers Academy
aipapersacademy.com › home › deepseek-r1 paper explained – a new rl llms era in ai?
DeepSeek-R1 Paper Explained – A New RL LLMs Era in AI? - AI Papers Academy
July 3, 2025 - The paper, titled “DeepSeek-R1: ... Learning”, presents a state-of-the-art, open-source reasoning model and a detailed recipe for training such models using large-scale reinforcement learning techniques....
🌐
Writesonic
writesonic.com › home › ai agents › what is deepseek r1? a complete guide to the ai model
What is DeepSeek R1? A Complete Guide to the AI Model
August 13, 2025 - DeepSeek R1 is an advanced artificial intelligence model developed by DeepSeek, designed to perform a wide range of language tasks including text generation, question answering, and code completion.
🌐
Wikipedia
en.wikipedia.org › wiki › DeepSeek
DeepSeek - Wikipedia
1 day ago - Released under the MIT License, DeepSeek-R1 provides responses comparable to other contemporary large language models, such as OpenAI's GPT-4 and o1. Its training cost was reported to be significantly lower than other LLMs.
🌐
TechTarget
techtarget.com › whatis › feature › DeepSeek-explained-Everything-you-need-to-know
DeepSeek explained: Everything you need to know
DeepSeek-R1-0528. Released in May 2025, the R1-0528 model is an updated version of the original R1 model. The model now supports system prompts, JSON output and function calling, making it more suitable for agentic AI use cases.
🌐
FastBots
fastbots.ai › blog › deepseek-r1-explained-features-benefits-and-use-cases
DeepSeek R1 Explained: Features, Benefits, and Use Cases
DeepSeek R1 offers advanced reasoning, problem-solving, and real-time decision-making. It facilitates AI models to process information step-by-step, enhancing accuracy and reliability in outputs.
🌐
Inferless
inferless.com › learn › the-ultimate-guide-to-deepseek-models
DeepSeek AI: Advancing Open-Source LLMs with MoE & Reinforcement Learning | DeepSeek-R1 & V3 Explained
DeepSeek-R1: DeepSeek-R1 is their latest first-generation reasoning model, which matches OpenAI's o1 in benchmarks. They also have DeepSeek-R1-Zero trained solely through large-scale reinforcement learning without supervised fine-tuning.
🌐
Towards AI
pub.towardsai.net › deepseek-r1-model-architecture-853fefac7050
DeepSeek-R1: Model Architecture. This article provides an in-depth… | by Shakti Wadekar | Towards AI
March 13, 2025 - Decoupled RoPE Strategy: To integrate positional information, DeepSeek-V2 (consequently DeepSeek-V3 and DeepSeek-R1) employs a decoupled RoPE approach. This involves creating additional query (Q) and key (K) vectors specifically designed to carry positional information. Concatenation: These RoPE-enhanced Q and K vectors are concatenated with the up-projected Q and K vectors. This is a bit of tricker part in MLA. I will try to explain it the way I understood it from the DeepSeek’s technical reports.
🌐
Fireworks AI
fireworks.ai › blog › deepseek-r1-deepdive
DeepSeek-R1 Overview: Features, Capabilities, Parameters
Whether it’s solving high-level mathematics, generating sophisticated code, or breaking down complex scientific questions, DeepSeek R1’s RL-based architecture allows it to self-discover and refine reasoning strategies over time.
🌐
Turing
turing.com › resources › understanding-deepseek-r1
DeepSeek R1 Explained: A Cost-Efficient Reasoning Focused LLM
3 weeks ago - DeepSeek-R1 achieves performance comparable to OpenAI’s o1-1217, but at a fraction of the cost. ... Mathematical & logical problem-solving: Effective at structured, step-by-step reasoning–especially for math, logic, puzzles, complex tasks, and chain-of-thought thinking with high accuracy. Code review & debugging: Performs well as a senior code reviewer. It can identify bugs, suggest improvements, and explain errors in a way that mirrors an experienced developer, making it useful for tech audits, code reviews, and debugging workflows.
🌐
arXiv
arxiv.org › abs › 2501.12948
[2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
January 22, 2025 - Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL.