🌐
arXiv
arxiv.org › pdf › 2501.12948 pdf
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
In this paper, we take the first step toward improving language model reasoning capabilities · using pure reinforcement learning (RL). Our goal is to explore the potential of LLMs to develop · reasoning capabilities without any supervised data, focusing on their self-evolution through · a pure RL process. Specifically, we use DeepSeek-V3-Base as the base model and employ
🌐
GitHub
github.com › deepseek-ai › DeepSeek-R1 › blob › main › DeepSeek_R1.pdf
DeepSeek-R1/DeepSeek_R1.pdf at main · deepseek-ai/DeepSeek-R1
deepseek-ai / DeepSeek-R1 Public · Notifications · You must be signed in to change notification settings · Fork 11.8k · Star 91.6k ·
Author   deepseek-ai
🌐
arXiv
arxiv.org › abs › 2501.12948
[2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
January 22, 2025 - To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama. From: Wenfeng Liang [view email] [v1] Wed, 22 Jan 2025 15:19:35 UTC (928 KB) ... View a PDF of the paper titled DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, by DeepSeek-AI and 199 other authors
🌐
Reddit
reddit.com › r/singularity › a summary of deepseek-r1's paper by deepseek-r1
r/singularity on Reddit: A summary of DeepSeek-R1's paper by DeepSeek-R1
November 28, 2024 -
  • Aha moments emerged naturally in RL: Self-correction behaviors like "Wait, let’s reevaluate..." arose without SFT.

  • Cold-start SFT fixed readability: ~1k structured examples resolved language mixing.

  • GRPO cut RL costs by 30%: Group-wise reward normalization outperformed PPO.

  • RL increased CoT length autonomously: Reasoning steps grew from 100→1k tokens without penalties.

  • Distillation beat direct RL in small models: SFT on R1 data outperformed RL-trained base models.

  • Process rewards failed; outcome rewards worked better: Rule-based final-answer checks stabilized training.

  • XML tags reduced hallucinations 15%: Structured <think>/<answer> improved reward clarity.

  • Language mixing fixed via consistency rewards: Penalized code-switching in multilingual outputs.

I find it funny that ive seen multiple AI youtubers explain papers and they just go to another AI to help them in the video but hey it does a good job

https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf

🌐
DeepSeek
api-docs.deepseek.com › deepseek-r1 release 2025/01/20
DeepSeek-R1 Release | DeepSeek API Docs
📄 More details: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf · 🌐 API Access & Pricing · ⚙️ Use DeepSeek-R1 by setting model=deepseek-reasoner · 💰 $0.14 / million input tokens (cache hit) 💰 $0.55 / million input tokens (cache miss) 💰 $2.19 / million output tokens ·
🌐
Hugging Face
huggingface.co › papers › 2501.12948
Paper page - DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama. View arXiv page View PDF GitHub 91.6k Add to collection ... This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
🌐
GitHub
github.com › deepseek-ai › DeepSeek-R1
GitHub - deepseek-ai/DeepSeek-R1
DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on ...
Starred by 91.6K users
Forked by 11.8K users
🌐
ResearchGate
researchgate.net › publication › 388484582_Technical_Report_Analyzing_DeepSeek-R1's_Impact_on_AI_Development
(PDF) Technical Report: Analyzing DeepSeek-R1's Impact on AI Development
January 20, 2025 - PDF | The paper "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" introduces DeepSeek-R1-Zero and DeepSeek-R1, two... | Find, read and cite all the research you need on ResearchGate
🌐
Tenorshare
tenorshare.com › home › tenorshare pdnob › deepseek r1 pdf explained: features, api, and more
DeepSeek R1 PDF Guide: Download, Installation, and Setup
February 6, 2025 - DeepSeek R1 is an advanced AI model designed to handle complex reasoning, code generation, and enterprise applications. In this guide, we’ll explore everything you need to know about DeepSeek R1, including how to access its official PDFs, understand its core features, and download the model.
Find elsewhere
🌐
ResearchGate
researchgate.net › publication › 388231214_DeepSeek_Revolutionizing_AI_with_Open-Source_Reasoning_Models_-Advancing_Innovation_Accessibility_and_Competition_with_OpenAI_and_Gemini_20
(PDF) DeepSeek: Revolutionizing AI with Open-Source Reasoning Models -Advancing Innovation, Accessibility, and Competition with OpenAI and Gemini 2.0
January 22, 2025 - PDF | DeepSeek's AI models have emerged as a transformative force in artificial intelligence, offering open-source alternatives to proprietary systems... | Find, read and cite all the research you need on ResearchGate
🌐
ResearchGate
researchgate.net › publication › 389004776_DeepSeek_R1_What_Sets_It_Apart
(PDF) DeepSeek R1: What Sets It Apart?
February 14, 2025 - PDF | DeepSeek R1: The AI Powerhouse Redefining Efficiency and Innovation Breaking the mold of proprietary AI, DeepSeek R1 emerges as a game-changer... | Find, read and cite all the research you need on ResearchGate
🌐
AI Papers Academy
aipapersacademy.com › home › deepseek-r1 paper explained – a new rl llms era in ai?
DeepSeek-R1 Paper Explained – A New RL LLMs Era in AI? - AI Papers Academy
July 3, 2025 - We conclude this review by highlighting the remarkable results of the freely available DeepSeek-R1 compared to OpenAI’s o1 model. The above figure from the paper shows how DeepSeek-R1 is not only comparable to but also surpasses o1 in certain benchmarks.
🌐
SSRN
papers.ssrn.com › sol3 › Delivery.cfm › 5107732.pdf
The Mathematics of DeepSeek-R1: Theoretical Foundations and ...
March 24, 2025 - Noguer I Alonso, Miquel, The Mathematics of DeepSeek-R1: Theoretical Foundations and Comparative Analysis (January 22, 2025).
🌐
Nature
nature.com › articles › article
DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning | Nature
September 17, 2025 - DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature 645, 633–638 (2025). https://doi.org/10.1038/s41586-025-09422-z ... Sorry, a shareable link is not currently available for this article. ... Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.
🌐
ResearchGate
researchgate.net › publication › 390043387_Brief_analysis_of_DeepSeek_R1_and_its_implications_for_Generative_AI
(PDF) Brief analysis of DeepSeek R1 and its implications for Generative AI
March 5, 2025 - PDF | In late January 2025, DeepSeek released their new reasoning model (DeepSeek R1); which was developed at a fraction of the cost yet remains... | Find, read and cite all the research you need on ResearchGate
🌐
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1
deepseek-ai/DeepSeek-R1 · Hugging Face
DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on ...
🌐
ResearchGate
researchgate.net › publication › 388856323_Highlighting_DeepSeek-R1_Architecture_Features_and_Future_Implications
(PDF) Highlighting DeepSeek-R1: Architecture, Features and Future Implications
February 11, 2025 - The aim of this paper is to highlight the main features of the DeepSeek-R1 innovations that could shape the future of LLMs by bridging the gap between basic research and applied developments.
🌐
Ollama
ollama.com › library › deepseek-r1
deepseek-r1
DeepSeek-R1 has received a minor version upgrade to DeepSeek-R1-0528 for the 8 billion parameter distilled model and the full 671 billion parameter model. In this update, DeepSeek R1 has significantly improved its reasoning and inference capabilities. The model has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic.
🌐
Booz Allen
boozallen.com › content › dam › home › docs › ai › a-technical-primer-on-deepseek.pdf pdf
A Technical Primer on DeepSeek
We transform missions with tomorrow’s technologies to advance the country’s most critical civil, defense, and national security priorities.