Brave Search

reddit.com › r › LocalLLaMA › comments › 1ic0v57 › what_model_is_deepseekr1_online

The web interface, as well as the official API, use the actual, full, 671B R1 model. ALL other "R1" models are finetunes of existing architectures (Llama 3.3 and Qwen 2.5). Answer from Zalathustra on reddit.com

Hugging Face

huggingface.co › deepseek-ai › DeepSeek-R1

deepseek-ai/DeepSeek-R1 · Hugging Face

1 month ago - DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models. NOTE: Before running DeepSeek-R1 series models locally, we kindly recommend reviewing the Usage Recommendation section.

reddit.com › r/localllama › what model is deepseek-r1 online?

r/LocalLLaMA on Reddit: What model is DeepSeek-R1 online?

January 28, 2025 -

excuseme if this is dumb question, im complete amateur in this, but im curious: i know you can download different models of DeepSeek-R1 localy and they ranging from 1,5gb size up to 402gb.. but what of this models use online version of DeepSeek? Thank you.

Videos

07:16

YouTube

How-To Increase Context Length of DeepSeek-R1 model in Ollama - ...

January 26, 2025

11:43

YouTube

New Open Source AI Model Destroys Llama 4 and DeepSeek R1 in Almost ...

April 9, 2025

03:05

YouTube

🐋 DeepSeek R1 Hardware Requirements | Explained 💻✅ - YouTube

February 6, 2025

m.youtube.com

How to know what size DeepSeek-R1 Model your PC can ...

10:56

YouTube

Run DeepSeek Locally - Various Model Sizes Explained & Tested - ...

February 3, 2025

youtube.com

Ultimate Offline/Local AI: Deepseek R1 | Complete Crash Course/Ollama ...

January 29, 2025

View all

Snowkylin

snowkylin.github.io › blogs › a-note-on-deepseek-r1.html

A Note on DeepSeek R1 Deployment

January 30, 2025 - The original DeepSeek R1 671B model is 720GB in size, which is huge. Even a $200k monster like NVIDIA DGX H100 (with 8xH100) can barely hold it. Here I use Unsloth AI’s dynamically quantized version, which selectively quantize a few important layers to higher bits, while leaving most of the ...

Fireworks AI

fireworks.ai › blog › deepseek-r1-deepdive

DeepSeek-R1 Overview: Features, Capabilities, Parameters

Operational expenses are estimated ... spend on OpenAI’s o1 model**.** Cost of running DeepSeek R1 on Fireworks AI is $8/ 1 M token (both input & output), whereas, running OpenAI o1 model costs $15/ 1M input tokens and $60/ ...

GitHub

github.com › deepseek-ai › DeepSeek-R1

GitHub - deepseek-ai/DeepSeek-R1

DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen.

Starred by 91.6K users

Forked by 11.8K users

DeepSeek

api-docs.deepseek.com › models & pricing

Models & Pricing | DeepSeek API Docs

The prices listed below are in units of per 1M tokens. A token, the smallest unit of text that the model recognizes, can be a word, a number, or even a punctuation mark. We will bill based on the total number of input and output tokens by the model.

Towards AI

pub.towardsai.net › deepseek-r1-model-architecture-853fefac7050

DeepSeek-R1: Model Architecture. This article provides an in-depth… | by Shakti Wadekar | Towards AI

March 13, 2025 - Efficient Learning — Experts specialize in different aspects of data, improving generalization. Computation Savings — Since only a subset of experts is used per token, MoE models are cheaper to run compared to dense models of the same size. DeepSeek-V3/R1 has 671 billion total parameters, of ...

Find elsewhere

Google Bing Mojeek

AWS

aws.amazon.com › blogs › machine-learning › deepseek-r1-model-now-available-in-amazon-bedrock-marketplace-and-amazon-sagemaker-jumpstart

DeepSeek-R1 model now available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart | Artificial Intelligence

February 5, 2025 - DeepSeek-R1 uses a Mixture of Experts (MoE) architecture and is 671 billion parameters in size.

OpenRouter

openrouter.ai › deepseek › deepseek-r1

R1 - API, Providers, Stats | OpenRouter

DeepSeek R1 is here: Performance on par with OpenAI o1, but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.

DEV Community

dev.to › askyt › deepseek-r1-671b-complete-hardware-requirements-optimal-deployment-setup-2e48

DeepSeek-R1 671B: Complete Hardware Requirements - DEV Community

February 6, 2025 - With 671 billion parameters, it matches the performance of leading models like OpenAI’s GPT-4, excelling in tasks such as mathematics, coding, and complex reasoning. The model was trained using 2,048 NVIDIA H800 GPUs over approximately two ...

Ollama

ollama.com › library › deepseek-r1

deepseek-r1

DeepSeek-R1 has received a minor version upgrade to DeepSeek-R1-0528 for the 8 billion parameter distilled model and the full 671 billion parameter model. In this update, DeepSeek R1 has significantly improved its reasoning and inference ...

BentoML

bentoml.com › blog › the-complete-guide-to-deepseek-models-from-v3-to-r1-and-beyond

The Complete Guide to DeepSeek Models: V3, R1, V3.1, V3.2 and Beyond

Instead of training new models from scratch, DeepSeek took a smart shortcut: Started with 6 open-source models from Llama 3.1/3.3 and Qwen 2.5 · Generated 800,000 high-quality reasoning samples using R1 · Fine-tuned the smaller models on these synthetic reasoning data · Unlike R1, these distilled models rely solely on SFT and they do not include an RL stage. Despite their smaller size, these models perform remarkably well on reasoning tasks, proving that large-scale AI reasoning can be efficiently distilled.

reddit.com › r/selfhosted › yes, you can run deepseek-r1 locally on your device (20gb ram min.)

r/selfhosted on Reddit: Yes, you can run DeepSeek-R1 locally on your device (20GB RAM min.)

February 13, 2025 -

I've recently seen some misconceptions that you can't run DeepSeek-R1 locally on your own device. Last weekend, we were busy trying to make you guys have the ability to run the actual R1 (non-distilled) model with just an RTX 4090 (24GB VRAM) which gives at least 2-3 tokens/second.

Over the weekend, we at Unsloth (currently a team of just 2 brothers) studied R1's architecture, then selectively quantized layers to 1.58-bit, 2-bit etc. which vastly outperforms basic versions with minimal compute.

We shrank R1, the 671B parameter model from 720GB to just 131GB (a 80% size reduction) whilst making it still fully functional and great
No the dynamic GGUFs does not work directly with Ollama but it does work on llama.cpp as they support sharded GGUFs and disk mmap offloading. For Ollama, you will need to merge the GGUFs manually using llama.cpp.
Minimum requirements: a CPU with 20GB of RAM (but it will be very slow) - and 140GB of diskspace (to download the model weights)
Optimal requirements: sum of your VRAM+RAM= 80GB+ (this will be somewhat ok)
No, you do not need hundreds of RAM+VRAM but if you have it, you can get 140 tokens per second for throughput & 14 tokens/s for single user inference with 2xH100
Our open-source GitHub repo: github.com/unslothai/unsloth

Many people have tried running the dynamic GGUFs on their potato devices and it works very well (including mine).

R1 GGUFs uploaded to Hugging Face: huggingface.co/unsloth/DeepSeek-R1-GGUF

To run your own R1 locally we have instructions + details: unsloth.ai/blog/deepseekr1-dynamic