🌐
arXiv
arxiv.org › abs › 2501.12948
[2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
January 22, 2025 - View a PDF of the paper titled DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, by DeepSeek-AI and 199 other authors View PDF HTML (experimental)
🌐
GitHub
github.com › deepseek-ai › DeepSeek-R1 › blob › main › DeepSeek_R1.pdf
DeepSeek-R1/DeepSeek_R1.pdf at main · deepseek-ai/DeepSeek-R1
deepseek-ai / DeepSeek-R1 Public · Notifications · You must be signed in to change notification settings · Fork 11.8k · Star 91.6k ·
Author   deepseek-ai
🌐
arXiv
arxiv.org › pdf › 2501.12948 pdf
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
In this paper, we take the first step toward improving language model reasoning capabilities · using pure reinforcement learning (RL). Our goal is to explore the potential of LLMs to develop · reasoning capabilities without any supervised data, focusing on their self-evolution through · a pure RL process. Specifically, we use DeepSeek-V3-Base as the base model and employ
🌐
DeepSeek
api-docs.deepseek.com › deepseek-r1 release 2025/01/20
DeepSeek-R1 Release | DeepSeek API Docs
📄 More details: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf · 🌐 API Access & Pricing · ⚙️ Use DeepSeek-R1 by setting model=deepseek-reasoner · 💰 $0.14 / million input tokens (cache hit) 💰 $0.55 / million input tokens (cache miss) 💰 $2.19 / million output tokens ·
🌐
Hugging Face
huggingface.co › papers › 2501.12948
Paper page - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama. View arXiv page View PDF GitHub 91.6k Add to collection ... This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
🌐
Reddit
reddit.com › r/singularity › a summary of deepseek-r1's paper by deepseek-r1
r/singularity on Reddit: A summary of DeepSeek-R1's paper by DeepSeek-R1
November 29, 2024 -
  • Aha moments emerged naturally in RL: Self-correction behaviors like "Wait, let’s reevaluate..." arose without SFT.

  • Cold-start SFT fixed readability: ~1k structured examples resolved language mixing.

  • GRPO cut RL costs by 30%: Group-wise reward normalization outperformed PPO.

  • RL increased CoT length autonomously: Reasoning steps grew from 100→1k tokens without penalties.

  • Distillation beat direct RL in small models: SFT on R1 data outperformed RL-trained base models.

  • Process rewards failed; outcome rewards worked better: Rule-based final-answer checks stabilized training.

  • XML tags reduced hallucinations 15%: Structured <think>/<answer> improved reward clarity.

  • Language mixing fixed via consistency rewards: Penalized code-switching in multilingual outputs.

I find it funny that ive seen multiple AI youtubers explain papers and they just go to another AI to help them in the video but hey it does a good job

https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf

🌐
AI Papers Academy
aipapersacademy.com › home › deepseek-r1 paper explained – a new rl llms era in ai?
DeepSeek-R1 Paper Explained – A New RL LLMs Era in AI? - AI Papers Academy
July 3, 2025 - The paper, titled “DeepSeek-R1: ... Learning”, presents a state-of-the-art, open-source reasoning model and a detailed recipe for training such models using large-scale reinforcement learning techniques....
🌐
ResearchGate
researchgate.net › publication › 388484582_Technical_Report_Analyzing_DeepSeek-R1's_Impact_on_AI_Development
(PDF) Technical Report: Analyzing DeepSeek-R1's Impact on AI Development
January 20, 2025 - PDF | The paper "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" introduces DeepSeek-R1-Zero and DeepSeek-R1, two... | Find, read and cite all the research you need on ResearchGate
🌐
GitHub
github.com › deepseek-ai › DeepSeek-R1
GitHub - deepseek-ai/DeepSeek-R1
DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on ...
Starred by 91.6K users
Forked by 11.8K users
Find elsewhere
🌐
ResearchGate
researchgate.net › publication › 389004776_DeepSeek_R1_What_Sets_It_Apart
(PDF) DeepSeek R1: What Sets It Apart?
February 14, 2025 - PDF | DeepSeek R1: The AI Powerhouse Redefining Efficiency and Innovation Breaking the mold of proprietary AI, DeepSeek R1 emerges as a game-changer... | Find, read and cite all the research you need on ResearchGate
🌐
Tenorshare
tenorshare.com › home › tenorshare pdnob › deepseek r1 pdf explained: features, api, and more
DeepSeek R1 PDF Guide: Download, Installation, and Setup
February 6, 2025 - The DeepSeek R1 PDF provides in-depth insights into its design, training methodology, and performance benchmarks. Now, we’ll guide you on how to access these documents and highlight the key areas to focus on when reviewing them.
🌐
SSRN
papers.ssrn.com › sol3 › Delivery.cfm › 5107732.pdf
The Mathematics of DeepSeek-R1: Theoretical Foundations and ...
March 24, 2025 - Noguer I Alonso, Miquel, The Mathematics of DeepSeek-R1: Theoretical Foundations and Comparative Analysis (January 22, 2025). Available at SSRN: https://ssrn.com/abstract=5107732 or http://dx.doi.org/10.2139/ssrn.5107732 · New York United States ...
🌐
arXiv
arxiv.org › abs › 2502.02523
[2502.02523] Brief analysis of DeepSeek R1 and its implications for Generative AI
February 7, 2025 - View a PDF of the paper titled Brief analysis of DeepSeek R1 and its implications for Generative AI, by Sarah Mercer and 2 other authors View PDF HTML (experimental)
🌐
ResearchGate
researchgate.net › publication › 388856323_Highlighting_DeepSeek-R1_Architecture_Features_and_Future_Implications
(PDF) Highlighting DeepSeek-R1: Architecture, Features and Future Implications
February 11, 2025 - DeepSeek-R1 emphasizes unique training processes devoid of supervised fine-tuning and utilizes rule-based reinforcement learning by means of group relative policy optimization. The paper will identify the major features of DeepSeek-R1 and their ...
🌐
YouTube
youtube.com › watch
Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning - YouTube
Paper reading in the Discord group. All the lecture was improvised.Join the group: https://discord.gg/JRKsaNbhCgLink to paper: https://github.com/deepseek-ai
Published   January 21, 2025
🌐
ResearchGate
researchgate.net › publication › 390043387_Brief_analysis_of_DeepSeek_R1_and_its_implications_for_Generative_AI
(PDF) Brief analysis of DeepSeek R1 and its implications for Generative AI
March 5, 2025 - PDF | In late January 2025, DeepSeek released their new reasoning model (DeepSeek R1); which was developed at a fraction of the cost yet remains... | Find, read and cite all the research you need on ResearchGate
🌐
Medium
medium.com › data-science-in-your-pocket › understanding-deepseek-r1-paper-beginners-guide-e86f83fda796
Understanding DeepSeek-R1 paper: Beginner’s guide | by Mehul Gupta | Data Science in Your Pocket | Medium
January 31, 2025 - Understanding DeepSeek-R1 paper: Beginner’s guide DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning So, by now you’ve heard of DeepSeek making huge waves in the …
🌐
X
x.com › HatforceSec › status › 1881377291080778063
Arthur Gervais on X: "Here's the full paper: https://t.co/k9GCDbH9PY" / X
DeepSeek-R1/DeepSeek_R1.pdf at main · deepseek-ai/DeepSeek-R1 · From github.com · 11:24 AM · Jan 20, 2025 · · · 170 Views · 1 · Sign up now to get your own personalized timeline! Sign up with GoogleSign up with Google. Opens in new tab · Sign up with Apple ·