Just wrote about it, it's absolutely great, and the less is more will definitely redefine AI as we know it Answer from Rolandojuve on reddit.com
🌐
Reddit
reddit.com › r/llmdevs › how was deepseek-r1 built; for dummies
r/LLMDevs on Reddit: How was DeepSeek-R1 built; For dummies
January 27, 2025 -

Over the weekend I wanted to learn how was DeepSeek-R1 trained, and what was so revolutionary about it. So I ended up reading the paper, and wrote down my thoughts. < the article linked is (hopefully) written in a way that it's easier for everyone to understand it -- no PhD required!

Here's a "quick" summary:

1/ DeepSeek-R1-Zero is trained with pure-reinforcement learning (RL), without using labeled data. It's the first time someone tried and succeeded doing that. (that we know of, o1 report didn't show much)

2/ Traditional RL frameworks (like PPO) have something like an 'LLM coach or critic' that tells the model whether the answer was good or bad -- based on given examples (labeled data). DeepSeek uses GRPO, a pure-RL framework that skips the critic and calculates the group average of LLM answers based on predefined rules

3/ But, how can you evaluate the performance if you don't have labeled data to test against it? With this framework, the rules aren't perfect—they’re just a best guess at what "good" looks like. The RL process tries to optimize on things like:

Does the answer make sense? (Coherence)

Is it in the right format? (Completeness)

Does it match the general style we expect? (Fluency)

For example, for the DeepSeek-R1-Zero model, for mathematical tasks, the model could be rewarded for producing outputs that align to mathematical principles or logical consistency.

It makes sense.. and it works... to some extent!

4/ This model (R1-Zero) had issues with poor readability and language mixing -- something that you'd get from using pure-RL. So, the authors wanted to go through a multi-stage training process and do something that feels like hacking various training methods:

5/ What you see above is the DeepSeek-R1 model that goes through a list of training methods for different purposes

(i) the cold start data lays a structured foundation fixing issues like poor readability
(ii) pure-RL develops reasoning almost on auto-pilot
(iii) rejection sampling + SFT works with top-tier training data that improves accuracy, and
(iv) another final RL stage ensures additional level of generalization.

And with that they're doing as good as or better than o1 models.

Lmk if you have any questions (i might be able to answer them).

🌐
Built In
builtin.com › artificial-intelligence › deepseek-r1
What Is DeepSeek-R1? | Built In
DeepSeek-R1, or R1, is an open-source language model made by Chinese AI startup DeepSeek that can perform the same text-based tasks as other advanced models, but at a lower cost. It also powers the DeepSeek chatbot, a direct competitor to ChatGPT.
Published   October 6, 2025
People also ask

How to access DeepSeek-R1
DeepSeek’s chatbot (which can be powered by the R1 model) is free to use on the company’s website and is available for download on the Apple App Store. R1 is also available for use on Hugging Face and DeepSeek’s API.
🌐
builtin.com
builtin.com › artificial-intelligence › deepseek-r1
What Is DeepSeek-R1? | Built In
Is DeepSeek-R1 open source?
Yes, DeepSeek is open source in that its model weights and training methods are freely available for the public to examine, use and build upon. However, its source code and any specifics about its underlying data are not available to the public.
🌐
builtin.com
builtin.com › artificial-intelligence › deepseek-r1
What Is DeepSeek-R1? | Built In
How many parameters does DeepSeek-R1 have?
DeepSeek-R1 has 671 billion parameters in total. But DeepSeek also released six “distilled” versions of R1, ranging in size from 1.5 billion parameters to 70 billion parameters. While the smallest can run on a laptop with consumer GPUs, the full R1 requires more substantial hardware.
🌐
builtin.com
builtin.com › artificial-intelligence › deepseek-r1
What Is DeepSeek-R1? | Built In
🌐
Hugging Face
huggingface.co › blog › NormalUhr › deepseek-r1-explained
From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning
DeepSeek-R1-Zero: A model that learns complex reasoning behaviors purely through reinforcement learning without any supervised fine-tuning, showing emergent abilities like extended chain-of-thought, reflection, and self-correction.
🌐
Vellum
vellum.ai › blog › the-training-of-deepseek-r1-and-ways-to-use-it
Breaking down the DeepSeek-R1 training process—no PhD required
January 27, 2025 - Step 1: They fine-tuned a base model (DeepSeek-V3-Base) with thousands of cold-start data points to lay a solid foundation. FYI, thousands of cold-start data points is a tiny fraction compared to the millions or even billions of labeled data points typically required for supervised learning at scale. Step 2: Applied pure RL (similar to R1-Zero) to enhance reasoning skills.
🌐
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1
deepseek-ai/DeepSeek-R1 · Hugging Face
DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities ...
🌐
Fireworks AI
fireworks.ai › blog › deepseek-r1-deepdive
DeepSeek-R1 Overview: Features, Capabilities, Parameters
Whether it’s solving high-level mathematics, generating sophisticated code, or breaking down complex scientific questions, DeepSeek R1’s RL-based architecture allows it to self-discover and refine reasoning strategies over time.
Find elsewhere
🌐
DataCamp
datacamp.com › blog › deepseek-r1
DeepSeek-R1: Features, o1 Comparison, Distilled Models & More | DataCamp
June 4, 2025 - With DeepSeek-R1, you can follow its logic, making it easier to understand and, if necessary, challenge its output. This capability gives reasoning models an edge in fields where outcomes need to be explainable, like research or complex decision-making.
🌐
Niklas Heidloff
heidloff.net › article › deepseek-r1
Key Concepts of DeepSeek-R1 | Niklas Heidloff
February 17, 2025 - DeepSeek-V3-Base is the base model for DeepSeek-R1. It is a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37b activated for each token. It has been pretrained on 14.8 trillion diverse and high-quality tokens.
🌐
GitHub
github.com › deepseek-ai › DeepSeek-R1
GitHub - deepseek-ai/DeepSeek-R1
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
Starred by 91.6K users
Forked by 11.8K users
🌐
Languagemodels
newsletter.languagemodels.co › p › the-illustrated-deepseek-r1
The Illustrated DeepSeek-R1 - by Jay Alammar
January 27, 2025 - Just like most existing LLMs, DeepSeek-R1 generates one token at a time, except it excels at solving math and reasoning problems because it is able to spend more time processing a problem through the process of generating thinking tokens that explain its chain of thought.
🌐
DeepSeek
api-docs.deepseek.com › deepseek-r1 release 2025/01/20
DeepSeek-R1 Release | DeepSeek API Docs
🛠️ DeepSeek-R1: Technical Highlights · 📈 Large-scale RL in post-training · 🏆 Significant performance boost with minimal labeled data · 🔢 Math, code, and reasoning tasks on par with OpenAI-o1 · 📄 More details: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf ·
🌐
Sean Goedecke
seangoedecke.com › deepseek-r1
What did DeepSeek figure out about reasoning with DeepSeek-R1?
There’s no need to generate a huge body of chain-of-thought data ahead of time, and there’s no need to run an expensive answer-checking model. Instead, the model generates its own chains-of-thought as it goes2. There are other points made in the DeepSeek-R1 paper, but I think this is by far the most important.
🌐
Medium
medium.com › @sahin.samia › deepseek-r1-explained-pioneering-the-next-era-of-reasoning-driven-ai-3eeb5ac4d4a0
DeepSeek-R1 explained : Pioneering the Next Era of Reasoning-Driven AI | by Sahin Ahmed, Data Scientist | Medium
March 4, 2025 - DeepSeek-R1 explained : Pioneering the Next Era of Reasoning-Driven AI Introduction The ability of Large Language Models (LLMs) to reason effectively is a defining measure of their intelligence. From …
🌐
arXiv
arxiv.org › abs › 2501.12948
[2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
January 22, 2025 - Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL.
🌐
Writesonic
writesonic.com › home › ai agents › what is deepseek r1? a complete guide to the ai model
What is DeepSeek R1? A Complete Guide to the AI Model
August 13, 2025 - DeepSeek R1 is an advanced artificial intelligence model developed by DeepSeek, designed to perform a wide range of language tasks including text generation, question answering, and code completion.
🌐
BentoML
bentoml.com › blog › the-complete-guide-to-deepseek-models-from-v3-to-r1-and-beyond
The Complete Guide to DeepSeek Models: V3, R1, V3.1, V3.2 and Beyond
That means it doesn’t just give you an answer; it explains how it got there. Before responding, R1 generates a step-by-step chain of thought, making it especially useful for: ... According to the DeepSeek-R1 paper re-published in Nature and its supplementary information, R1’s training cost ...
🌐
Ollama
ollama.com › library › deepseek-r1
deepseek-r1
DeepSeek-R1 is a family of open reasoning models with performance approaching that of leading models, such as O3 and Gemini 2.5 Pro.
🌐
The Indian Express
indianexpress.com › news › technology › artificial intelligence
DeepSeek R1 hands-on: 5 things we tried, including developing a game | Technology News - The Indian Express
February 8, 2025 - I prompted Deepseek to write the code for a Tetris game. The R1 model responded saying, “Creating a Tetris game from scratch involves several steps, including setting up the game grid, handling user input, managing the falling tetrominoes, and checking for completed lines.
🌐
Together AI
docs.together.ai › docs › deepseek-r1
DeepSeek R1 Quickstart - Together.ai Docs
Reasoning models like DeepSeek-R1 have been trained to think step-by-step before responding with an answer. As a result they excel at complex reasoning tasks such as coding, mathematics, planning, puzzles, and agent workflows. Given a question in the form of an input prompt DeepSeek-R1 outputs both its chain of thought/reasoning process in the form of thinking tokens between <think> tags and the answer.