🌐
arXiv
arxiv.org › abs › 2501.12948
[2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
January 22, 2025 - To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama. From: Wenfeng Liang [view email] [v1] Wed, 22 Jan 2025 15:19:35 UTC (928 KB) ... View a PDF of the paper titled DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, by DeepSeek-AI and 199 other authors
🌐
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1-Zero
deepseek-ai/DeepSeek-R1-Zero · Hugging Face
DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen.
🌐
GitHub
github.com › deepseek-ai › DeepSeek-R1
GitHub - deepseek-ai/DeepSeek-R1
DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen.
Starred by 91.6K users
Forked by 11.8K users
🌐
OpenReview
openreview.net › pdf pdf
Understanding R1-Zero-Like Training: A Critical Perspective Zichen Liu † * 1 2
Deepseek-v3 · technical report. arXiv preprint arXiv:2412.19437, 2024. Liu, Z., Chen, C., Du, C., Lee, W. S., and Lin, M. Oat: A research-friendly framework for llm online alignment. ... Liu, Z., Chen, C., Li, W., Pang, T., Du, C., and Lin, M. There may not be aha moment in r1-zero-like training
🌐
GitHub
github.com › deepseek-ai › DeepSeek-R1 › blob › main › DeepSeek_R1.pdf
DeepSeek-R1/DeepSeek_R1.pdf at main · deepseek-ai/DeepSeek-R1
deepseek-ai / DeepSeek-R1 Public · Notifications · You must be signed in to change notification settings · Fork 11.8k · Star 91.6k ·
Author   deepseek-ai
🌐
Hugging Face
huggingface.co › unsloth › DeepSeek-R1-Zero-GGUF
unsloth/DeepSeek-R1-Zero-GGUF · Hugging Face
DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen.
Find elsewhere
🌐
DeepSeek
api-docs.deepseek.com › deepseek-r1 release 2025/01/20
DeepSeek-R1 Release | DeepSeek API Docs
📄 More details: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf · 🌐 API Access & Pricing · ⚙️ Use DeepSeek-R1 by setting model=deepseek-reasoner · 💰 $0.14 / million input tokens (cache hit) 💰 $0.55 / million input tokens (cache miss) 💰 $2.19 / million output tokens ·
🌐
ResearchGate
researchgate.net › publication › 388484582_Technical_Report_Analyzing_DeepSeek-R1's_Impact_on_AI_Development
(PDF) Technical Report: Analyzing DeepSeek-R1's Impact on AI Development
January 20, 2025 - PDF | The paper "DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning" introduces DeepSeek-R1-Zero and DeepSeek-R1, two... | Find, read and cite all the research you need on ResearchGate
🌐
Ollama
ollama.com › library › deepseek-r1
deepseek-r1
DeepSeek-R1 has received a minor version upgrade to DeepSeek-R1-0528 for the 8 billion parameter distilled model and the full 671 billion parameter model. In this update, DeepSeek R1 has significantly improved its reasoning and inference capabilities. The model has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic.
🌐
ResearchGate
researchgate.net › publication › 390043387_Brief_analysis_of_DeepSeek_R1_and_its_implications_for_Generative_AI
(PDF) Brief analysis of DeepSeek R1 and its implications for Generative AI
March 5, 2025 - They intend to replicate the R1-distil models, by extracting a high-quality reasoning corpus from DeepSeek- R1, reproduce the pure reinforcement learning pipeline used to create R1-Zero model, and demonstrate the
🌐
Reddit
reddit.com › r/localllama › deepseek r1 / r1 zero
r/LocalLLaMA on Reddit: Deepseek R1 / R1 Zero
January 20, 2025 - I got a bunch of such requests but declined all of them because I do not want to help them train a model to replace myself and achieve a short AGI timeline. But it is less relevant now because R1 Zero told the world you can just use outcome based RL and skip the expensive human annotation. Continue this thread Continue this thread Continue this thread Continue this thread ... The DeepSeek R1 paper is out.
🌐
AI Papers Academy
aipapersacademy.com › home › deepseek-r1 paper explained – a new rl llms era in ai?
DeepSeek-R1 Paper Explained – A New RL LLMs Era in AI? - AI Papers Academy
July 3, 2025 - Cold Start (Phase 1): Starting with the pre-trained model DeepSeek-V3-Base, the model undergoes supervised fine-tuning on a small dataset of results collected from DeepSeek-R1-Zero. These results were validated as high-quality and readable. This dataset contains thousands of samples, making it relatively small.
🌐
Hugging Face
huggingface.co › blog › NormalUhr › deepseek-r1-explained
From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning
DeepSeek-R1: Building on R1-Zero, this version incorporates a small amount of high-quality "cold-start" data alongside iterative reinforcement learning and supervised fine-tuning to produce more coherent, user-friendly outputs while maintaining state-of-the-art reasoning performance.
🌐
Nature
nature.com › articles › article
DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning | Nature
September 17, 2025 - We believe that these instruction-tuned versions will also greatly contribute to the research community by providing a valuable resource for understanding the mechanisms underlying long CoT reasoning models and for promoting the development of more powerful reasoning models. We release DeepSeek-R1-Zero, DeepSeek-R1, data samples and distilled models to the public as described in the ‘Code availability’ section.
🌐
Ghasemzadeh
ghasemzadeh.com › event › 2025-1-abdullah-deepseek-r1 › deepseek_mamun.pdf pdf
DeepSeek-R1: Incentivizing Reasoning Capability in
DeepSeek R1 vs DeepSeek V3 vs DeepSeek ... and reinforcement learning to boost · performance. DeepSeek R1-Zero: • Similar architecture but prioritizes zero-shot capabilities without fine- tuning....
🌐
ResearchGate
researchgate.net › publication › 389004776_DeepSeek_R1_What_Sets_It_Apart
(PDF) DeepSeek R1: What Sets It Apart?
February 14, 2025 - DeepSeek-R1-Zero struggles with challenges like poor readability, and language mixing.