🌐
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1-Zero
deepseek-ai/DeepSeek-R1-Zero · Hugging Face
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
🌐
GitHub
github.com › deepseek-ai › DeepSeek-R1
GitHub - deepseek-ai/DeepSeek-R1
DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities ...
Starred by 91.6K users
Forked by 11.8K users
🌐
arXiv
arxiv.org › abs › 2501.12948
[2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
January 22, 2025 - DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities.
🌐
ARC Prize
arcprize.org › blog › r1-zero-r1-results-analysis
An Analysis of DeepSeek's R1-Zero and R1
Last week, DeepSeek published their new R1-Zero and R1 “reasoner” systems that is competitive with OpenAI’s o1 system on ARC-AGI-1. R1-Zero, R1, and o1 (low compute) all score around 15-20% – in contrast to GPT-4o’s 5%, the pinnacle of years of pure LLM scaling.
🌐
Reddit
reddit.com › r/localllama › deepseek r1 / r1 zero
r/LocalLLaMA on Reddit: Deepseek R1 / R1 Zero
January 20, 2025 - In section 2.2. DeepSeek-R1-Zero: ... we explore the potential of LLMs to develop reasoning capabilities without any supervised data, focusing on their self-evolution through a pure reinforcement learning process....
🌐
arXiv
arxiv.org › pdf › 2501.12948 pdf
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.
🌐
Thelmbook
thelmbook.com › articles
DeepSeek R1 and R1-Zero Explained
This website requires Javascript to be enabled. Please turn on Javascript and reload the page
🌐
Hugging Face
huggingface.co › unsloth › DeepSeek-R1-Zero-GGUF
unsloth/DeepSeek-R1-Zero-GGUF · Hugging Face
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
🌐
Gocodeo
gocodeo.com › post › deepseek-r1-and-deepseek-r1-zero
DeepSeek-R1 and DeepSeek-R1-Zero: Redefining AI Reasoning and Developer Productivity
Reinforcement Learning on the Base Model (DeepSeek-R1-Zero) DeepSeek-R1-Zero’s development hinged on Group Relative Policy Optimization (GRPO), a reinforcement learning (RL) framework designed for cost efficiency and effectiveness.
Find elsewhere
🌐
Medium
medium.com › data-science-in-your-pocket › deepseek-r1-vs-deepseek-r1-zero-3ab8eeed8b62
DeepSeek-R1 vs DeepSeek-R1-Zero
January 20, 2025 - DeepSeek has just released DeepSeek-R1 and DeepSeek-R1-Zero, two new reasoning models outperforming OpenAI-o1 on various benchmarks.
🌐
BentoML
bentoml.com › blog › the-complete-guide-to-deepseek-models-from-v3-to-r1-and-beyond
The Complete Guide to DeepSeek Models: V3, R1, V3.1, V3.2 and Beyond
The result was R1, a model that not only keeps the reasoning power of R1-Zero but significantly improves accuracy, readability, and coherence. Unlike V3, which is optimized for general tasks, R1 is a true reasoning model. That means it doesn’t just give you an answer; it explains how it got there. Before responding, R1 generates a step-by-step chain of thought, making it especially useful for: ... According to the DeepSeek-R1 paper re-published in Nature and its supplementary information, R1’s training cost was the equivalent of just US$294K primarily on NVIDIA H800 chips.
🌐
Hacker News
news.ycombinator.com › item
An analysis of DeepSeek's R1-Zero and R1 | Hacker News
February 12, 2025 - While I think this is an interesting hypothesis, I'm skeptical. You might be lowering the cost of your training corpus by a few million dollars, but I highly doubt you are getting novel, high quality data · We are currently in a world where SOTA base model seems to be capped at around GPT4o levels.
🌐
Prompt Hub
prompthub.us › blog › deepseek-r-1-model-overview-and-how-it-ranks-against-openais-o1
PromptHub Blog: DeepSeek R-1 Model Overview and How it Ranks Against OpenAI's o1
DeepSeek-R1 builds on the foundation ... and overall performance. DeepSeek-R1-Zero: Trained entirely with reinforcement learning (RL) and no supervised fine-tuning (SFT)....
🌐
OpenRouter
openrouter.ai › deepseek › deepseek-r1-zero
DeepSeek R1 Zero - API, Providers, Stats | OpenRouter
DeepSeek-R1-Zero is a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step. It's 671B parameters in size, with 37B active in an inference pass. Run DeepSeek R1 Zero with API
🌐
Unsloth
unsloth.ai › blog › deepseek-r1
Run Deepseek-R1 / R1 Zero
DeepSeek's latest R-1 model is the most powerful open-source reasoning model that performs on par with OpenAI's o1 model. Learn how to run & fine-tune the model.
🌐
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1
deepseek-ai/DeepSeek-R1 · Hugging Face
DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities ...
🌐
Hugging Face
huggingface.co › unsloth › DeepSeek-R1-Zero
unsloth/DeepSeek-R1-Zero · Hugging Face
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
🌐
DeepLearning.AI
deeplearning.ai › the-batch › deepseek-releases-r1-r1-zero-and-six-smaller-distilled-models
Data Points: DeepSeek releases R1, R1-Zero, and six smaller distilled models
January 20, 2025 - DeepSeek-R1 achieves performance comparable to OpenAI’s latest o1 model on reasoning tasks, including a 79.8 percent pass rate on AIME 2024 and 97.3 percent on MATH-500. The model, along with the reinforcement-learning-trained R1-Zero and smaller distilled versions, is now available under an MIT license, allowing open access for the community to use the model weights and outputs.
🌐
Epoch AI
epoch.ai › gradient-updates › what-went-into-training-deepseek-r1
What went into training DeepSeek-R1? | Epoch AI
January 31, 2025 - The RL loop that produces R1-Zero is the core of the reasoning training, but it’s not the only step before the final R1 model is trained. Building on this checkpoint, DeepSeek curates a cold-start dataset (partly including cleaned up R1-Zero outputs) to fine-tune the base v3 model before ...
🌐
Nature
nature.com › articles › article
DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning | Nature
September 17, 2025 - This design choice originates from our hypothesis that human-defined reasoning patterns may limit model exploration, whereas unrestricted RL training can better incentivize the emergence of new reasoning capabilities in LLMs. Through this process, detailed in the next section, our model (referred to as DeepSeek-R1-Zero) naturally developed diverse and sophisticated reasoning behaviours.