🌐
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1
deepseek-ai/DeepSeek-R1 · Hugging Face
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
🌐
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1-Zero
deepseek-ai/DeepSeek-R1-Zero · Hugging Face
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
🌐
Hugging Face
huggingface.co › unsloth › DeepSeek-R1-Zero
unsloth/DeepSeek-R1-Zero · Hugging Face
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
🌐
GitHub
github.com › deepseek-ai › DeepSeek-R1
GitHub - deepseek-ai/DeepSeek-R1
DeepSeek-R1-Zero · 671B · 37B · 128K · 🤗 HuggingFace · DeepSeek-R1 · 671B · 37B · 128K · 🤗 HuggingFace · DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. For more details regarding the model architecture, please refer to DeepSeek-V3 repository.
Starred by 91.6K users
Forked by 11.8K users
🌐
Hugging Face
huggingface.co › unsloth › DeepSeek-R1-Zero-GGUF
unsloth/DeepSeek-R1-Zero-GGUF · Hugging Face
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
🌐
Hugging Face
huggingface.co › unsloth › DeepSeek-R1-GGUF
unsloth/DeepSeek-R1-GGUF · Hugging Face
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
🌐
GitHub
github.com › huggingface › open-r1
GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1
Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero.
Starred by 25.7K users
Forked by 2.4K users
Languages   Python 89.4% | Shell 10.0% | Makefile 0.6%
🌐
Hugging Face
huggingface.co › blog › open-r1
Open-R1: a fully open reproduction of DeepSeek-R1
DeepSeek also introduced two models: DeepSeek-R1-Zero and DeepSeek-R1, each with a distinct training approach. DeepSeek-R1-Zero skipped supervised fine-tuning altogether and relied entirely on reinforcement learning (RL), using Group Relative Policy Optimization (GRPO) to make the process more ...
🌐
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1-Zero › tree › main
deepseek-ai/DeepSeek-R1-Zero at main
DeepSeek-R1-Zero · 689 GB · 5 contributors History: 27 commits · msr2000 · Small fix 7223428 9 months ago · figures · Release DeepSeek-R1 11 months ago · .gitattributes 1.52 kB · initial commit 11 months ago · LICENSE 1.06 kB · Release DeepSeek-R1 11 months ago ·
Find elsewhere
🌐
Hugging Face
huggingface.co › unsloth › DeepSeek-R1
unsloth/DeepSeek-R1 · Hugging Face
DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities ...
🌐
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1-Zero › discussions › 8
deepseek-ai/DeepSeek-R1-Zero · Thank you deepseek
The comment above is understated. I just glanced their DeepSeek R1 paper. They have: Proved that LLMs can be post-trained to reason with CoT without any human supervised data, with R1 zero.
🌐
Hugging Face
huggingface.co › blog › NormalUhr › deepseek-r1-explained
From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning
DeepSeek-R1-Zero: A model that learns complex reasoning behaviors purely through reinforcement learning without any supervised fine-tuning, showing emergent abilities like extended chain-of-thought, reflection, and self-correction.
🌐
Reddit
reddit.com › r/localllama › deepseek r1 / r1 zero
r/LocalLLaMA on Reddit: Deepseek R1 / R1 Zero
January 20, 2025 - When asked for its name in chat.deepseek.com, it says DeepSeek R1. ... I use it from Open-WebUI via OpenRouter. Continue this thread ... Continue this thread Continue this thread Continue this thread Continue this thread Continue this thread ... There seems to be distilled versions as well: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B
🌐
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1-Zero › commit › cfa4541c2f577dafd72f51f2b14119bde0f3c812
Release DeepSeek-R1 · deepseek-ai/DeepSeek-R1-Zero at cfa4541
deepseek_v3 · conversational · custom_code · text-generation-inference · fp8 · arXiv: 2501.12948 · License: mit · Model card Files Files and versions · xet Community · 24 · Train · Deploy · Use this model · msr2000 commited on Jan 20 · Commit · cfa4541 ·
🌐
Unsloth
unsloth.ai › blog › deepseek-r1
Run Deepseek-R1 / R1 Zero
DeepSeek also distilled from R1 and fine-tuned it on Llama 3 and Qwen 2.5 models, meaning you can now also fine-tune the models out of the box with Unsloth. See our collection for all versions of the R1 model series including GGUF's, 4-bit and more! huggingface.co/collections/unsloth/deepseek-r1 Jan 27, 2025 update: We've released 1.58-bit Dynamic GGUFs for DeepSeek-R1 allowing you to run R1 even better with a 80% size reduction: 1.58-bit Dynamic R1 Feb 6, 2025 update: You can now train your own reasoning model like R1 using: GRPO + Unsloth
🌐
Hugging Face
huggingface.co › deepseek-ai
deepseek-ai (DeepSeek)
Org profile for DeepSeek on Hugging Face, the AI community building the future.
🌐
Reddit
reddit.com › r/deeplearning › hugging face releases fully open source version of deepseek r1 called open-r1
r/deeplearning on Reddit: hugging face releases fully open source version of deepseek r1 called open-r1
October 31, 2024 -

for those afraid of using a chinese ai or want to more easily build more powerful ais based on deepseek's r1:

"The release of DeepSeek-R1 is an amazing boon for the community, but they didn’t release everything—although the model weights are open, the datasets and code used to train the model are not.

The goal of Open-R1 is to build these last missing pieces so that the whole research and industry community can build similar or better models using these recipes and datasets. And by doing this in the open, everybody in the community can contribute!.

As shown in the figure below, here’s our plan of attack:

Step 1: Replicate the R1-Distill models by distilling a high-quality reasoning dataset from DeepSeek-R1.

Step 2: Replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.

Step 3: Show we can go from base model → SFT → RL via multi-stage training.

The synthetic datasets will allow everybody to fine-tune existing or new LLMs into reasoning models by simply fine-tuning on them. The training recipes involving RL will serve as a starting point for anybody to build similar models from scratch and will allow researchers to build even more advanced methods on top."

https://huggingface.co/blog/open-r1?utm_source=tldrai#what-is-deepseek-r1

🌐
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1-Zero › tree › 945a40ef65921ea791b3d8232f7a8bb8c35a67f0 › figures
deepseek-ai/DeepSeek-R1-Zero at 945a40ef65921ea791b3d8232f7a8bb8c35a67f0
DeepSeek-R1-Zero / figures · Ctrl+K · Ctrl+K · 5 contributors History: 1 commit · msr2000 · Release DeepSeek-R1 cfa4541 5 months ago · benchmark.jpg · Safe 777 kB ·
🌐
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1-Zero › discussions › 20
deepseek-ai/DeepSeek-R1-Zero · locally running ideas
deepseek_v3 · conversational · custom_code · text-generation-inference · fp8 · arxiv: 2501.12948 · License: mit · Model card Files Files and versions · xet Community · 24 · Train · Deploy · Use this model · #20 · by gopi87 - opened Jan 30 · Discussion ·