excuseme if this is dumb question, im complete amateur in this, but im curious: i know you can download different models of DeepSeek-R1 localy and they ranging from 1,5gb size up to 402gb.. but what of this models use online version of DeepSeek? Thank you.
Yes, you can run DeepSeek-R1 locally on your device (20GB RAM min.)
Hardware requirements for running the full size deepseek R1 with ollama?
Videos
I've recently seen some misconceptions that you can't run DeepSeek-R1 locally on your own device. Last weekend, we were busy trying to make you guys have the ability to run the actual R1 (non-distilled) model with just an RTX 4090 (24GB VRAM) which gives at least 2-3 tokens/second.
Over the weekend, we at Unsloth (currently a team of just 2 brothers) studied R1's architecture, then selectively quantized layers to 1.58-bit, 2-bit etc. which vastly outperforms basic versions with minimal compute.
We shrank R1, the 671B parameter model from 720GB to just 131GB (a 80% size reduction) whilst making it still fully functional and great
No the dynamic GGUFs does not work directly with Ollama but it does work on llama.cpp as they support sharded GGUFs and disk mmap offloading. For Ollama, you will need to merge the GGUFs manually using llama.cpp.
Minimum requirements: a CPU with 20GB of RAM (but it will be very slow) - and 140GB of diskspace (to download the model weights)
Optimal requirements: sum of your VRAM+RAM= 80GB+ (this will be somewhat ok)
No, you do not need hundreds of RAM+VRAM but if you have it, you can get 140 tokens per second for throughput & 14 tokens/s for single user inference with 2xH100
Our open-source GitHub repo: github.com/unslothai/unsloth
Many people have tried running the dynamic GGUFs on their potato devices and it works very well (including mine).
R1 GGUFs uploaded to Hugging Face: huggingface.co/unsloth/DeepSeek-R1-GGUF
To run your own R1 locally we have instructions + details: unsloth.ai/blog/deepseekr1-dynamic
My machine runs the Deepseek R1-14B model fine, but the 34B and 70B are too slow for practical use. I am looking at building a machine capable of running the full 671B model fast enough that it's not too annoying as a coding assistant. What kind of hardware do i need?