Towards AI
pub.towardsai.net › deepseek-r1-model-architecture-853fefac7050
DeepSeek-R1: Model Architecture. This article provides an in-depth… | by Shakti Wadekar | Towards AI
March 13, 2025 - DeepSeek-R1 employs Multi-Head Latent Attention (MLA) layers instead of standard multi-head attention across all transformer layers. The first three transformer layers differ from the rest, using a standard Feed-Forward Network (FFN) layer. From layer 4 to 61, a Mixture-of-Experts (MoE) layer ...
HiddenLayer
hiddenlayer.com › home › research › innovation hub › analysing deepseek-r1’s architecture
Analysing DeepSeek-R1’s Architecture
March 25, 2025 - For the purposes of our analysis, our team converted the DeepSeek R1 model hosted on HuggingFace to the ONNX file format, enabling us to examine its computational graph. We used this, along with a review of associated technical papers and code, to identify shared characteristics and subgraphs observed within other models and piece together the defining features of its architecture.
Videos
Medium
medium.com › @namnguyenthe › deepseek-r1-architecture-and-training-explain-83319903a684
DeepSeek-R1: Architecture and training explain | by The Nam | Medium
January 25, 2025 - But does DeepSeek-R1 rely entirely on RL? The answer is both yes and no. The authors released two distinct models: DeepSeek-R1-Zero and DeepSeek-R1. The former only used RL in the post-training process. While it showed performance on par with GPT-o1 on certain reasoning benchmarks, it struggled with poor readability and occasional language mixing.
GitHub
github.com › deepseek-ai › DeepSeek-R1
GitHub - deepseek-ai/DeepSeek-R1
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base. For more details regarding the model architecture, please refer to DeepSeek-V3 repository.
Starred by 91.6K users
Forked by 11.8K users
YouTube
youtube.com › watch
DeepSeek R1 Theory Tutorial – Architecture, GRPO, KL Divergence - YouTube
Learn about DeepSeek R1's innovative AI architecture from @deeplearningexplained. The course explores how R1 achieves exceptional reasoning through reinforc...
Published March 11, 2025
HiddenLayer
hiddenlayer.com › home › research › innovation hub › deepseek-r1 architecture
DeepSeek-R1 Architecture
March 25, 2025 - Initial analysis revealed that DeepSeek-R1 shares its architecture with DeepSeekV3, which supports the information provided in the model’s accompanying write-up. The primary difference is that R1 was fine-tuned using Reinforcement Learning to improve reasoning and Chain-of-Thought output.
arXiv
arxiv.org › pdf › 2501.12948 pdf
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Table 1 | Template for DeepSeek-R1-Zero. prompt will be replaced with the specific reasoning ... The reward is the source of the training signal, which decides the optimization direction of RL. To train DeepSeek-R1-Zero, we adopt a rule-based reward system that mainly consists of two
DeepWiki
deepwiki.com › deepseek-ai › DeepSeek-R1 › 2-model-architecture
Model Architecture | deepseek-ai/DeepSeek-R1 | DeepWiki
This architecture enables these models to have a massive parameter count while maintaining computational efficiency during inference. ... The core innovation in the DeepSeek-R1 models is the Mixture of Experts (MoE) architecture, which allows the model to have a large total parameter count ...
Fireworks AI
fireworks.ai › blog › deepseek-r1-deepdive
DeepSeek-R1 Overview: Features, Capabilities, Parameters
DeepSeek R1 excels at tasks demanding logical inference, chain-of-thought reasoning, and real-time decision-making. Whether it’s solving high-level mathematics, generating sophisticated code, or breaking down complex scientific questions, DeepSeek R1’s RL-based architecture allows it to self-discover and refine reasoning strategies over time.
Founderscreative
founderscreative.org › model-architecture-behind-deepseek-r1
Model Architecture Behind DeepSeek R1 – Founders Creative
This concludes this section. We explored three primary architectural patterns that the DeepSeek team adapted and enhanced to develop the DeepSeek-R1 model: DeepSeekMoE, Multi-Head Latent Attention, and Mult-Token Prediction. We also reviewed various improvements made to the training framework ...