deepseek r1 zero - Brave Search

huggingface.co › deepseek-ai › DeepSeek-R1-Zero

deepseek-ai/DeepSeek-R1-Zero · Hugging Face

1 month ago - DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.

arxiv.org › abs › 2501.12948

[2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

January 22, 2025 - DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities.

Discussions

Deepseek R1 / R1 Zero

Open sourcing an o1 level model is incredible, already feared they might hide this beauty behind an api. More on reddit.com

r/LocalLLaMA

117

408

January 20, 2025

An analysis of DeepSeek's R1-Zero and R1

While I think this is an interesting hypothesis, I'm skeptical. You might be lowering the cost of your training corpus by a few million dollars, but I highly doubt you are getting novel, high quality data · We are currently in a world where SOTA base model seems to be capped at around GPT4o levels. More on news.ycombinator.com

news.ycombinator.com

272

732

February 12, 2025

DeepSeek R1-zero

If I am reading the model summary on this page correctly this might be what you're looking for? https://huggingface.co/deepseek-ai/DeepSeek-R1-Zero "' reinforcement learning (RL) without supervised fine-tuning (SFT)" "This approach allows the model to explore chain-of-thought (CoT) for solving complex problems," More on reddit.com

r/selfhosted

6

0

April 13, 2024

[deleted by user]

Link the actual source don’t drop a context less img More on reddit.com

r/LocalLLaMA

26

40

June 8, 2024

Videos

ArrrZero: Why DeepSeek R1 is less important than R1-Zero - YouTube

January 31, 2025

DeepSeek R1 Theory Overview | GRPO + RL + SFT - YouTube

January 31, 2025

DeepSeek-R1 vs DeepSeek-R1-Zero - YouTube

January 20, 2025

DeepSeek R1 Explained by AI Expert: How R1-Zero Led to an AI ...

DeepSeek R1 Theory Tutorial – Architecture, GRPO, KL Divergence ...

Mike Knoop on the difference between R1 and R1 zero - YouTube

February 7, 2025

thelmbook.com › articles

DeepSeek R1 and R1-Zero Explained

This website requires Javascript to be enabled. Please turn on Javascript and reload the page

reddit.com › r/localllama › deepseek r1 / r1 zero

r/LocalLLaMA on Reddit: Deepseek R1 / R1 Zero

January 20, 2025 - In section 2.2. DeepSeek-R1-Zero: ... we explore the potential of LLMs to develop reasoning capabilities without any supervised data, focusing on their self-evolution through a pure reinforcement learning process....

openrouter.ai › deepseek › deepseek-r1-zero

DeepSeek R1 Zero - API, Providers, Stats | OpenRouter

March 6, 2025 - DeepSeek-R1-Zero is a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step. It's 671B parameters in size, with 37B active in an inference pass. Run DeepSeek R1 Zero with API

arcprize.org › blog › r1-zero-r1-results-analysis

An Analysis of DeepSeek's R1-Zero and R1

January 29, 2025 - Last week, DeepSeek published their new R1-Zero and R1 “reasoner” systems that is competitive with OpenAI’s o1 system on ARC-AGI-1. R1-Zero, R1, and o1 (low compute) all score around 15-20% – in contrast to GPT-4o’s 5%, the pinnacle of years of pure LLM scaling.

github.com › deepseek-ai › DeepSeek-R1

GitHub - deepseek-ai/DeepSeek-R1

DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.

Starred by 91.6K users

Forked by 11.8K users

epoch.ai › gradient-updates › what-went-into-training-deepseek-r1

What went into training DeepSeek-R1? | Epoch AI

January 31, 2025 - The RL loop that produces R1-Zero is the core of the reasoning training, but it’s not the only step before the final R1 model is trained. Building on this checkpoint, DeepSeek curates a cold-start dataset (partly including cleaned up R1-Zero outputs) to fine-tune the base v3 model before ...

Find elsewhere

Google Bing Mojeek

prompthub.us › blog › deepseek-r-1-model-overview-and-how-it-ranks-against-openais-o1

PromptHub Blog: DeepSeek R-1 Model Overview and How it Ranks Against OpenAI's o1

DeepSeek-R1 builds on the foundation ... and overall performance. DeepSeek-R1-Zero: Trained entirely with reinforcement learning (RL) and no supervised fine-tuning (SFT)....

nature.com › articles › article

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning | Nature

September 17, 2025 - This design choice originates from our hypothesis that human-defined reasoning patterns may limit model exploration, whereas unrestricted RL training can better incentivize the emergence of new reasoning capabilities in LLMs. Through this process, detailed in the next section, our model (referred to as DeepSeek-R1-Zero) naturally developed diverse and sophisticated reasoning behaviours.

en.wikipedia.org › wiki › DeepSeek

DeepSeek - Wikipedia

2 days ago - DeepSeek-R1-Zero was trained exclusively using GRPO RL without SFT. Unlike previous versions, it used no model-based reward. All reward functions were rule-based, "mainly" of two types (other types were not specified): accuracy rewards and format rewards. Accuracy reward was checking whether ...

Company operation

Training framework

Development and release history

Overview of models

news.ycombinator.com › item

An analysis of DeepSeek's R1-Zero and R1 | Hacker News

February 12, 2025 - While I think this is an interesting hypothesis, I'm skeptical. You might be lowering the cost of your training corpus by a few million dollars, but I highly doubt you are getting novel, high quality data · We are currently in a world where SOTA base model seems to be capped at around GPT4o levels.

medium.com › data-science-in-your-pocket › deepseek-r1-vs-deepseek-r1-zero-3ab8eeed8b62

DeepSeek-R1 vs DeepSeek-R1-Zero

January 20, 2025 - DeepSeek has just released DeepSeek-R1 and DeepSeek-R1-Zero, two new reasoning models outperforming OpenAI-o1 on various benchmarks.

bentoml.com › blog › the-complete-guide-to-deepseek-models-from-v3-to-r1-and-beyond

The Complete Guide to DeepSeek Models: V3, R1, V3.1, V3.2 and Beyond

The result was R1, a model that not only keeps the reasoning power of R1-Zero but significantly improves accuracy, readability, and coherence. Unlike V3, which is optimized for general tasks, R1 is a true reasoning model. That means it doesn’t just give you an answer; it explains how it got there. Before responding, R1 generates a step-by-step chain of thought, making it especially useful for: ... According to the DeepSeek-R1 paper re-published in Nature and its supplementary information, R1’s training cost was the equivalent of just US$294K primarily on NVIDIA H800 chips.

DeepLearning.AI

deeplearning.ai › the-batch › deepseek-releases-r1-r1-zero-and-six-smaller-distilled-models

Data Points: DeepSeek releases R1, R1-Zero, and six smaller distilled models

January 20, 2025 - DeepSeek-R1 achieves performance comparable to OpenAI’s latest o1 model on reasoning tasks, including a 79.8 percent pass rate on AIME 2024 and 97.3 percent on MATH-500. The model, along with the reinforcement-learning-trained R1-Zero and smaller distilled versions, is now available under an MIT license, allowing open access for the community to use the model weights and outputs.

Sebastian Raschka

magazine.sebastianraschka.com › p › technical-deepseek

A Technical Tour of the DeepSeek Models from V3 to V3.2

3 weeks ago - While DeepSeek V3 wasn’t popular immediately upon release in December 2024, the DeepSeek R1 reasoning model (based on the identical architecture, using DeepSeek V3 as a base model) helped DeepSeek become one of the most popular open-weight models and a legit alternative to proprietary models ...

pub.towardsai.net › grpo-and-deepseek-r1-zero-9e81f15c6ba2

GRPO and DeepSeek-R1-Zero. 📚 Table of Contents | by Shakti Wadekar | Towards AI

March 15, 2025 - DeepSeek-R1-Zero is a cutting-edge large language model (LLM) developed by DeepSeek, a AI research company. What makes it stand out is its unique training method. While most language models rely heavily on supervised fine-tuning (SFT), where they learn from labeled datasets provided by humans, ...

vellum.ai › blog › the-training-of-deepseek-r1-and-ways-to-use-it

Breaking down the DeepSeek-R1 training process—no PhD required

January 27, 2025 - DeepSeek just made a breakthrough: you can train a model to match OpenAI o1-level reasoning using pure reinforcement learning (RL) without using labeled data (DeepSeek-R1-Zero). But RL alone isn’t perfect — it can lead to challenges like ...

api-docs.deepseek.com › deepseek-r1 release 2025/01/20

DeepSeek-R1 Release | DeepSeek API Docs

🛠️ DeepSeek-R1: Technical Highlights · 📈 Large-scale RL in post-training · 🏆 Significant performance boost with minimal labeled data · 🔢 Math, code, and reasoning tasks on par with OpenAI-o1 · 📄 More details: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf ·

marketplace.fptcloud.com › en › ai-product › DeepSeek › DeepSeek-R1

A model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors.