Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1-0528-Qwen3-8B
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B · Hugging Face
November 27, 2025 - This model achieves state-of-the-art (SOTA) performance among open-source models on the AIME 2024, surpassing Qwen3 8B by +10.0% and matching the performance of Qwen3-235B-thinking.
Reddit
reddit.com › r/localllama › deepseek’s new r1-0528-qwen3-8b is the most intelligent 8b parameter model yet, but not by much: alibaba’s own qwen3 8b is just one point behind
r/LocalLLaMA on Reddit: DeepSeek’s new R1-0528-Qwen3-8B is the most intelligent 8B parameter model yet, but not by much: Alibaba’s own Qwen3 8B is just one point behind
June 5, 2025 -
source: https://x.com/ArtificialAnlys/status/1930630854268850271
amazing to have a local 8b model so smart like this in my machine!
what are your thoughts?
Top answer 1 of 5
65
Those benchmarks are a meme. ArtificialAnalysis uses benchmarks established by other research groups, which are often old and overtrained, so they aren't reliable. They carefully show or hide models on default list to paint a picture of bigger models doing better, but when you enable Qwen 8B and 32B with reasoning to be shown, this all falls apart. It's nice enough to brag about a model on LinkedIn, and they are somewhat useful - they seem to be independent and the image and video arenas are great, but they're not capable of maintaining a leak-proof expert benchmarks. Look at math reasoning: DeepSeek R10528 (May '25) - 94 Qwen3 14B (reasoning) - 86 Qwen3 8B (Reasoning) - 83 DeepSeek R1 (Jan '25) - 82 DeepSeek R1 05-28 Qwen3 8B - 79 Claude 3.7 Sonnet (thinking) - 72 Overall bench (Intelligence Index) : DeepSeek R1 (Jan '25) - 60 Qwen3 32B (Reasoning) - 59 Do you believe that it makes sense for Qwen3 8B to score above DeepSeek R1 or for Claude Sonnet 3.7 to be outclassed by DeepSeek R1 05-28 Qwen3 8B with a big margin? Another bench - LiveCodeBench Qwen3 14B (Reasoning) - 52 Claude 3.7 Sonnet thinking - 47 Why are devs using Claude 3.7/4 in Windsurf/Cursor/Roo/Cline/Aider and not Qwen 3 14B? Qwen3 14B is apparently a much better coder lmao. I can't call it benchmark contamination but it's definitely overfit to benchmarks. For god's sake, when you let base Qwen 2.5 32B non-Instruct generate random tokens with trash prompt it will often generate MMLU-style questions and answer pairs out of itself. It's trained to do well at benchmarks that they test on.
2 of 5
13
i really dont trust artificial analysis rankings these days since they just aggregate other peoples old benchmarks and like they still use scicode or whatever meanwhile its literally beyond satured all models score 99% on it
Deepseek-r1-0528-qwen3-8b is much better than expected.
Agreed, the CoT is cleaner and solved problems that OG 8B couldn’t. I hope they can do this for also the 30/32/235B too More on reddit.com
DeepSeek-R1-0528-Qwen3-8B
The work that Deepseek has done is great, but it's obvious that an 8B model cannot score that high on these tests organically (at least for now). This has already been trained on the AIME and other competitions, so these benchmarks alone don't represent any real world usage. Eg, I saw someone say that Gemini 2.5 Flash is on par or better than this 8b model due to how both scored on a certain test. I wish they were right, but these benchmarks should not be taken to face value. More on reddit.com
Anyone have any experience with Deepseek-R1-0528-Qwen3-8B?
Works just fine out of the box in LM Studio. More on reddit.com
DeepSeek’s new R1-0528-Qwen3-8B is the most intelligent 8B parameter model yet, but not by much: Alibaba’s own Qwen3 8B is just one point behind
Those benchmarks are a meme. ArtificialAnalysis uses benchmarks established by other research groups, which are often old and overtrained, so they aren't reliable. They carefully show or hide models on default list to paint a picture of bigger models doing better, but when you enable Qwen 8B and 32B with reasoning to be shown, this all falls apart. It's nice enough to brag about a model on LinkedIn, and they are somewhat useful - they seem to be independent and the image and video arenas are great, but they're not capable of maintaining a leak-proof expert benchmarks. Look at math reasoning: DeepSeek R10528 (May '25) - 94 Qwen3 14B (reasoning) - 86 Qwen3 8B (Reasoning) - 83 DeepSeek R1 (Jan '25) - 82 DeepSeek R1 05-28 Qwen3 8B - 79 Claude 3.7 Sonnet (thinking) - 72 Overall bench (Intelligence Index) : DeepSeek R1 (Jan '25) - 60 Qwen3 32B (Reasoning) - 59 Do you believe that it makes sense for Qwen3 8B to score above DeepSeek R1 or for Claude Sonnet 3.7 to be outclassed by DeepSeek R1 05-28 Qwen3 8B with a big margin? Another bench - LiveCodeBench Qwen3 14B (Reasoning) - 52 Claude 3.7 Sonnet thinking - 47 Why are devs using Claude 3.7/4 in Windsurf/Cursor/Roo/Cline/Aider and not Qwen 3 14B? Qwen3 14B is apparently a much better coder lmao. I can't call it benchmark contamination but it's definitely overfit to benchmarks. For god's sake, when you let base Qwen 2.5 32B non-Instruct generate random tokens with trash prompt it will often generate MMLU-style questions and answer pairs out of itself. It's trained to do well at benchmarks that they test on. More on reddit.com
Videos
r/LocalLLaMA on Reddit: DeepSeek-R1-0528-Qwen3-8B on iPhone 16 Pro
17:15
DeepSeek R1 0528 Qwen3 8B - Small Upgraded Student Model - Install ...
10:25
Run DeepSeek-R1-0528-Qwen3-8B Locally with Gaia (Easy Tutorial!)
41:46
DeepSeek R1 0528 : 8B vs 671B (Live Test) - YouTube
r/LocalLLaMA on Reddit: deepseek r1 0528 qwen 8b on android MNN chat
12:50
New DeepSeek R1 is Really, Really Good Coder - YouTube
Ollama
ollama.com › sam860 › deepseek-r1-0528-qwen3:8b
sam860/deepseek-r1-0528-qwen3:8b
DeepSeek-R1-0528-Qwen3-8B represents a significant upgrade to the DeepSeek R1 model series, built on the Qwen3 architecture. This version (0528) delivers enhanced reasoning and inference capabilities through algorithmic optimization and increased ...
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-R1-0528-Qwen3-8B › discussions › 11
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B · Tried it, but not good as expected.
May 30, 2025 - @SytanSD I had similar issues with the source Qwen3 8b model. It failed to answer simple questions that much smaller models like Llama 3.2 3b reliably got right, such as what's the third rock from the sun (Earth). So I suspect the primary issue is that DeepSeek used Qwen3, which so egregiously overfit to the standard LLM tests that they're riddled with pockets of profound ignorance, making them frustratingly unreliable across a spectrum of real-world tasks.
Apidog
apidog.com › blog › deepseek-r1-0528-qwen-8b-local-ollama-lm-studio
Running DeepSeek R1 0528 Qwen 8B Locally: Complete Guide with Ollama and LM Studio
August 17, 2025 - Setting up DeepSeek R1 0528 in LM Studio involves navigating to the model catalog and searching for "DeepSeek R1 0528" or "Deepseek-r1-0528-qwen3-8b." The catalog displays various quantization options, allowing users to select the version that best matches their hardware capabilities.
Ollama
ollama.com › library › deepseek-r1:8b
deepseek-r1:8b
DeepSeek-R1-0528-Qwen3-8B · ollama run deepseek-r1 · DeepSeek-R1 · ollama run deepseek-r1:671b · Note: to update the model from an older version, run ollama pull deepseek-r1 ·
Reddit
reddit.com › r/localllama › deepseek-r1-0528-qwen3-8b is much better than expected.
r/LocalLLaMA on Reddit: Deepseek-r1-0528-qwen3-8b is much better than expected.
May 30, 2025 -
In the past, I tried creating agents with models smaller than 32B, but they often gave completely off-the-mark answers to commands or failed to generate the specified JSON structures correctly. However, this model has exceeded my expectations. I used to think of small models like the 8B ones as just tech demos, but it seems the situation is starting to change little by little.
First image – Structured question request
Second image – Answer
Tested : LMstudio, Q8, Temp 0.6, Top_k 0.95
Top answer 1 of 4
68
Agreed, the CoT is cleaner and solved problems that OG 8B couldn’t. I hope they can do this for also the 30/32/235B too
2 of 4
46
I asked it to make a web interface for my book creator tool. I gave it just the documents I created describing the project and it made a working html interface first go. Not 100% perfect but pretty bloody good for a 8b model. Dark mode works too but not perfect and some colours are similar so you cant see the text but easily fixed....
Artificial Analysis
artificialanalysis.ai › models › comparisons › deepseek-r1-vs-qwen3-8b-instruct
DeepSeek R1 0528 (May '25) vs Qwen3 8B (Non-reasoning): Model Comparison
Comparison between DeepSeek R1 0528 (May '25) and Qwen3 8B (Non-reasoning) across intelligence, price, speed, context window and more.
Read the Docs
inference.readthedocs.io › en › v1.8.0 › models › builtin › llm › deepseek-r1-0528-qwen3.html
deepseek-r1-0528-qwen3 — Xinference
Model ID: QuantTrio/DeepSeek-R1-0528-Qwen3-8B-{quantization} Model Hubs: Hugging Face, ModelScope ·
Routstr
routstr.com › models › deepseek › deepseek-r1-0528-qwen3-8b
Deepseek R1 0528 Qwen3 8B
May 29, 2025 - The future of AI access is permissionless, private, and decentralized