arXiv
arxiv.org › pdf › 2501.12948 pdf
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Table 1 | Template for DeepSeek-R1-Zero.
arXiv
arxiv.org › abs › 2501.12948
[2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
January 22, 2025 - To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama. From: Wenfeng Liang [view email] [v1] Wed, 22 Jan 2025 15:19:35 UTC (928 KB) ... View a PDF of the paper titled DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, by DeepSeek-AI and 199 other authors
Videos
13:11
ArrrZero: Why DeepSeek R1 is less important than R1-Zero - YouTube
03:31
DeepSeek-R1 vs DeepSeek-R1-Zero - YouTube
27:26
DeepSeek R1 TRAINING SECRETS You Need to Know! (With Code) - YouTube
25:36
DeepSeek R1 Theory Overview | GRPO + RL + SFT - YouTube
08:56
How To Run DeepSeek R1 Right Now Offline In Your Machine - YouTube
02:46
DeepSeek R1 Zero Explained | Reinforcement Learning in AI - YouTube
GitHub
github.com › deepseek-ai › DeepSeek-R1
GitHub - deepseek-ai/DeepSeek-R1
DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen.
Starred by 91.6K users
Forked by 11.8K users
OpenReview
openreview.net › pdf pdf
Understanding R1-Zero-Like Training: A Critical Perspective Zichen Liu † * 1 2
Deepseek-v3 · technical report. arXiv preprint arXiv:2412.19437, 2024. Liu, Z., Chen, C., Du, C., Lee, W. S., and Lin, M. Oat: A research-friendly framework for llm online alignment. ... Liu, Z., Chen, C., Li, W., Pang, T., Du, C., and Lin, M. There may not be aha moment in r1-zero-like training
GitHub
github.com › deepseek-ai › DeepSeek-R1 › blob › main › DeepSeek_R1.pdf
DeepSeek-R1/DeepSeek_R1.pdf at main · deepseek-ai/DeepSeek-R1
deepseek-ai / DeepSeek-R1 Public · Notifications · You must be signed in to change notification settings · Fork 11.8k · Star 91.6k ·
Author deepseek-ai
arXiv
arxiv.org › pdf › 2503.20783 pdf
Understanding R1-Zero-Like Training: A Critical Perspective
DeepSeek-R1-Zero (R1-Zero).
University of Toronto
cs.toronto.edu › ~cmaddis › courses › csc2541_w25 › presentations › ivanov_farhat_deepseek.pdf pdf
DeepSeek-R1 Nikita Ivanov & Elias Abou Farhat
Zhang, R., Xu, Q., ... & Li, S. S. (2025). DeepSeek-R1:
Ollama
ollama.com › library › deepseek-r1
deepseek-r1
DeepSeek-R1 has received a minor version upgrade to DeepSeek-R1-0528 for the 8 billion parameter distilled model and the full 671 billion parameter model. In this update, DeepSeek R1 has significantly improved its reasoning and inference capabilities. The model has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic.
Reddit
reddit.com › r/localllama › deepseek r1 / r1 zero
r/LocalLLaMA on Reddit: Deepseek R1 / R1 Zero
January 20, 2025 - I got a bunch of such requests but declined all of them because I do not want to help them train a model to replace myself and achieve a short AGI timeline. But it is less relevant now because R1 Zero told the world you can just use outcome based RL and skip the expensive human annotation. Continue this thread Continue this thread Continue this thread Continue this thread ... The DeepSeek R1 paper is out.
AI Papers Academy
aipapersacademy.com › home › deepseek-r1 paper explained – a new rl llms era in ai?
DeepSeek-R1 Paper Explained – A New RL LLMs Era in AI? - AI Papers Academy
July 3, 2025 - Cold Start (Phase 1): Starting with the pre-trained model DeepSeek-V3-Base, the model undergoes supervised fine-tuning on a small dataset of results collected from DeepSeek-R1-Zero. These results were validated as high-quality and readable. This dataset contains thousands of samples, making it relatively small.
Nature
nature.com › articles › article
DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning | Nature
September 17, 2025 - We believe that these instruction-tuned versions will also greatly contribute to the research community by providing a valuable resource for understanding the mechanisms underlying long CoT reasoning models and for promoting the development of more powerful reasoning models. We release DeepSeek-R1-Zero, DeepSeek-R1, data samples and distilled models to the public as described in the ‘Code availability’ section.
Ghasemzadeh
ghasemzadeh.com › event › 2025-1-abdullah-deepseek-r1 › deepseek_mamun.pdf pdf
DeepSeek-R1: Incentivizing Reasoning Capability in
DeepSeek R1 vs DeepSeek V3 vs DeepSeek ... and reinforcement learning to boost · performance. DeepSeek R1-Zero: • Similar architecture but prioritizes zero-shot capabilities without fine- tuning....