deepseek-r1-zero huggingface - Brave Search

huggingface.co › deepseek-ai › DeepSeek-R1

deepseek-ai/DeepSeek-R1 · Hugging Face

DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.

huggingface.co › deepseek-ai › DeepSeek-R1-Zero

deepseek-ai/DeepSeek-R1-Zero · Hugging Face

DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.

Videos

Deploying Deepseek R1 on Hugging Face - YouTube

Hugging Face Journal Club - DeepSeek R1 - YouTube

January 22, 2025

DeepSeek V3.1 First Test – Is This The BEST Open Source LLM? ...

August 20, 2025

DeepSeek R1 Explained by AI Expert: How R1-Zero Led to an AI ...

ArrrZero: Why DeepSeek R1 is less important than R1-Zero - YouTube

January 31, 2025

DeepSeek-R1 vs DeepSeek-R1-Zero - YouTube

January 20, 2025

huggingface.co › unsloth › DeepSeek-R1-Zero

unsloth/DeepSeek-R1-Zero · Hugging Face

DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.

github.com › deepseek-ai › DeepSeek-R1

GitHub - deepseek-ai/DeepSeek-R1

DeepSeek-R1-Zero · 671B · 37B · 128K · 🤗 HuggingFace · DeepSeek-R1 · 671B · 37B · 128K · 🤗 HuggingFace · DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. For more details regarding the model architecture, please refer to DeepSeek-V3 repository.

Starred by 91.6K users

Forked by 11.8K users

huggingface.co › unsloth › DeepSeek-R1-Zero-GGUF

unsloth/DeepSeek-R1-Zero-GGUF · Hugging Face

DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.

huggingface.co › unsloth › DeepSeek-R1-GGUF

unsloth/DeepSeek-R1-GGUF · Hugging Face

DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.

github.com › huggingface › open-r1

GitHub - huggingface/open-r1: Fully open reproduction of DeepSeek-R1

Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero.

Starred by 25.7K users

Forked by 2.4K users

Languages Python 89.4% | Shell 10.0% | Makefile 0.6%

huggingface.co › blog › open-r1

Open-R1: a fully open reproduction of DeepSeek-R1

DeepSeek also introduced two models: DeepSeek-R1-Zero and DeepSeek-R1, each with a distinct training approach. DeepSeek-R1-Zero skipped supervised fine-tuning altogether and relied entirely on reinforcement learning (RL), using Group Relative Policy Optimization (GRPO) to make the process more ...

huggingface.co › deepseek-ai › DeepSeek-R1-Zero › tree › main

deepseek-ai/DeepSeek-R1-Zero at main

DeepSeek-R1-Zero · 689 GB · 5 contributors History: 27 commits · msr2000 · Small fix 7223428 9 months ago · figures · Release DeepSeek-R1 11 months ago · .gitattributes 1.52 kB · initial commit 11 months ago · LICENSE 1.06 kB · Release DeepSeek-R1 11 months ago ·

Find elsewhere

Google Bing Mojeek

huggingface.co › learn › llm-course › en › chapter12 › 3

Understanding the DeepSeek R1 Paper - Hugging Face LLM Course

DeepSeek-R1-Zero: A model trained purely using reinforcement learning.

huggingface.co › unsloth › DeepSeek-R1

unsloth/DeepSeek-R1 · Hugging Face

DeepSeek-R1-Zero demonstrates capabilities such as self-verification, reflection, and generating long CoTs, marking a significant milestone for the research community. Notably, it is the first open research to validate that reasoning capabilities ...

huggingface.co › deepseek-ai › DeepSeek-R1-Zero › discussions › 8

deepseek-ai/DeepSeek-R1-Zero · Thank you deepseek

The comment above is understated. I just glanced their DeepSeek R1 paper. They have: Proved that LLMs can be post-trained to reason with CoT without any human supervised data, with R1 zero.

huggingface.co › blog › NormalUhr › deepseek-r1-explained

From Zero to Reasoning Hero: How DeepSeek-R1 Leverages Reinforcement Learning to Master Complex Reasoning

DeepSeek-R1-Zero: A model that learns complex reasoning behaviors purely through reinforcement learning without any supervised fine-tuning, showing emergent abilities like extended chain-of-thought, reflection, and self-correction.

reddit.com › r/localllama › deepseek r1 / r1 zero

r/LocalLLaMA on Reddit: Deepseek R1 / R1 Zero

January 20, 2025 - When asked for its name in chat.deepseek.com, it says DeepSeek R1. ... I use it from Open-WebUI via OpenRouter. Continue this thread ... Continue this thread Continue this thread Continue this thread Continue this thread Continue this thread ... There seems to be distilled versions as well: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B

huggingface.co › deepseek-ai › DeepSeek-R1-Zero › commit › cfa4541c2f577dafd72f51f2b14119bde0f3c812

Release DeepSeek-R1 · deepseek-ai/DeepSeek-R1-Zero at cfa4541

deepseek_v3 · conversational · custom_code · text-generation-inference · fp8 · arXiv: 2501.12948 · License: mit · Model card Files Files and versions · xet Community · 24 · Train · Deploy · Use this model · msr2000 commited on Jan 20 · Commit · cfa4541 ·

unsloth.ai › blog › deepseek-r1

Run Deepseek-R1 / R1 Zero

DeepSeek also distilled from R1 and fine-tuned it on Llama 3 and Qwen 2.5 models, meaning you can now also fine-tune the models out of the box with Unsloth. See our collection for all versions of the R1 model series including GGUF's, 4-bit and more! huggingface.co/collections/unsloth/deepseek-r1 Jan 27, 2025 update: We've released 1.58-bit Dynamic GGUFs for DeepSeek-R1 allowing you to run R1 even better with a 80% size reduction: 1.58-bit Dynamic R1 Feb 6, 2025 update: You can now train your own reasoning model like R1 using: GRPO + Unsloth

huggingface.co › deepseek-ai

deepseek-ai (DeepSeek)

Org profile for DeepSeek on Hugging Face, the AI community building the future.

reddit.com › r/deeplearning › hugging face releases fully open source version of deepseek r1 called open-r1

r/deeplearning on Reddit: hugging face releases fully open source version of deepseek r1 called open-r1

October 31, 2024 -

for those afraid of using a chinese ai or want to more easily build more powerful ais based on deepseek's r1:

"The release of DeepSeek-R1 is an amazing boon for the community, but they didn’t release everything—although the model weights are open, the datasets and code used to train the model are not.

The goal of Open-R1 is to build these last missing pieces so that the whole research and industry community can build similar or better models using these recipes and datasets. And by doing this in the open, everybody in the community can contribute!.

As shown in the figure below, here’s our plan of attack:

Step 1: Replicate the R1-Distill models by distilling a high-quality reasoning dataset from DeepSeek-R1.

Step 2: Replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.

Step 3: Show we can go from base model → SFT → RL via multi-stage training.

The synthetic datasets will allow everybody to fine-tune existing or new LLMs into reasoning models by simply fine-tuning on them. The training recipes involving RL will serve as a starting point for anybody to build similar models from scratch and will allow researchers to build even more advanced methods on top."

https://huggingface.co/blog/open-r1?utm_source=tldrai#what-is-deepseek-r1

There's not much to comment on! Thank God we have initiatives like this that allow us, the curious mortals, to learn, play, and enjoy the latest trends and researchs in this passionate world of AI.

an important correction. they haven't released it yet; they're still working on it. but i doubt it will take them more than a couple of weeks.

huggingface.co › deepseek-ai › DeepSeek-R1-Zero › tree › 945a40ef65921ea791b3d8232f7a8bb8c35a67f0 › figures

deepseek-ai/DeepSeek-R1-Zero at 945a40ef65921ea791b3d8232f7a8bb8c35a67f0

DeepSeek-R1-Zero / figures · Ctrl+K · Ctrl+K · 5 contributors History: 1 commit · msr2000 · Release DeepSeek-R1 cfa4541 5 months ago · benchmark.jpg · Safe 777 kB ·

huggingface.co › deepseek-ai › DeepSeek-R1-Zero › discussions › 20

deepseek-ai/DeepSeek-R1-Zero · locally running ideas

deepseek_v3 · conversational · custom_code · text-generation-inference · fp8 · arxiv: 2501.12948 · License: mit · Model card Files Files and versions · xet Community · 24 · Train · Deploy · Use this model · #20 · by gopi87 - opened Jan 30 · Discussion ·