deepseek-r1-0528-qwen3-8b

November 27, 2025 - This model achieves state-of-the-art (SOTA) performance among open-source models on the AIME 2024, surpassing Qwen3 8B by +10.0% and matching the performance of Qwen3-235B-thinking.

reddit.com › r/localllama › deepseek’s new r1-0528-qwen3-8b is the most intelligent 8b parameter model yet, but not by much: alibaba’s own qwen3 8b is just one point behind

r/LocalLLaMA on Reddit: DeepSeek’s new R1-0528-Qwen3-8B is the most intelligent 8B parameter model yet, but not by much: Alibaba’s own Qwen3 8B is just one point behind

June 5, 2025 -

source: https://x.com/ArtificialAnlys/status/1930630854268850271

amazing to have a local 8b model so smart like this in my machine!

what are your thoughts?

Top answer

1 of 5

65

Those benchmarks are a meme. ArtificialAnalysis uses benchmarks established by other research groups, which are often old and overtrained, so they aren't reliable. They carefully show or hide models on default list to paint a picture of bigger models doing better, but when you enable Qwen 8B and 32B with reasoning to be shown, this all falls apart. It's nice enough to brag about a model on LinkedIn, and they are somewhat useful - they seem to be independent and the image and video arenas are great, but they're not capable of maintaining a leak-proof expert benchmarks. Look at math reasoning: DeepSeek R10528 (May '25) - 94 Qwen3 14B (reasoning) - 86 Qwen3 8B (Reasoning) - 83 DeepSeek R1 (Jan '25) - 82 DeepSeek R1 05-28 Qwen3 8B - 79 Claude 3.7 Sonnet (thinking) - 72 Overall bench (Intelligence Index) : DeepSeek R1 (Jan '25) - 60 Qwen3 32B (Reasoning) - 59 Do you believe that it makes sense for Qwen3 8B to score above DeepSeek R1 or for Claude Sonnet 3.7 to be outclassed by DeepSeek R1 05-28 Qwen3 8B with a big margin? Another bench - LiveCodeBench Qwen3 14B (Reasoning) - 52 Claude 3.7 Sonnet thinking - 47 Why are devs using Claude 3.7/4 in Windsurf/Cursor/Roo/Cline/Aider and not Qwen 3 14B? Qwen3 14B is apparently a much better coder lmao. I can't call it benchmark contamination but it's definitely overfit to benchmarks. For god's sake, when you let base Qwen 2.5 32B non-Instruct generate random tokens with trash prompt it will often generate MMLU-style questions and answer pairs out of itself. It's trained to do well at benchmarks that they test on.

2 of 5

13

i really dont trust artificial analysis rankings these days since they just aggregate other peoples old benchmarks and like they still use scicode or whatever meanwhile its literally beyond satured all models score 99% on it

Discussions

Performance issues of DeeSeek-R1-0528-Qwen3-8B in AIME2024

Hi! Thank you very much for your work! We tested the performance of DeeSeek-R1-0528-Qwen3-8B on AIME2024. After hyperparameter search for inference, we found that the highest score is around 83, wh... More on github.com

github.com

2

June 13, 2025

DeepSeek-R1-0528-Qwen3-8B

The work that Deepseek has done is great, but it's obvious that an 8B model cannot score that high on these tests organically (at least for now). This has already been trained on the AIME and other competitions, so these benchmarks alone don't represent any real world usage. Eg, I saw someone say that Gemini 2.5 Flash is on par or better than this 8b model due to how both scored on a certain test. I wish they were right, but these benchmarks should not be taken to face value. More on reddit.com

r/LocalLLaMA

34

127

April 11, 2025

Anyone have any experience with Deepseek-R1-0528-Qwen3-8B?

Works just fine out of the box in LM Studio. More on reddit.com

r/LocalLLaMA

19

7

April 17, 2025

DeepSeek’s new R1-0528-Qwen3-8B is the most intelligent 8B parameter model yet, but not by much: Alibaba’s own Qwen3 8B is just one point behind

Those benchmarks are a meme. ArtificialAnalysis uses benchmarks established by other research groups, which are often old and overtrained, so they aren't reliable. They carefully show or hide models on default list to paint a picture of bigger models doing better, but when you enable Qwen 8B and 32B with reasoning to be shown, this all falls apart. It's nice enough to brag about a model on LinkedIn, and they are somewhat useful - they seem to be independent and the image and video arenas are great, but they're not capable of maintaining a leak-proof expert benchmarks. Look at math reasoning: DeepSeek R10528 (May '25) - 94 Qwen3 14B (reasoning) - 86 Qwen3 8B (Reasoning) - 83 DeepSeek R1 (Jan '25) - 82 DeepSeek R1 05-28 Qwen3 8B - 79 Claude 3.7 Sonnet (thinking) - 72 Overall bench (Intelligence Index) : DeepSeek R1 (Jan '25) - 60 Qwen3 32B (Reasoning) - 59 Do you believe that it makes sense for Qwen3 8B to score above DeepSeek R1 or for Claude Sonnet 3.7 to be outclassed by DeepSeek R1 05-28 Qwen3 8B with a big margin? Another bench - LiveCodeBench Qwen3 14B (Reasoning) - 52 Claude 3.7 Sonnet thinking - 47 Why are devs using Claude 3.7/4 in Windsurf/Cursor/Roo/Cline/Aider and not Qwen 3 14B? Qwen3 14B is apparently a much better coder lmao. I can't call it benchmark contamination but it's definitely overfit to benchmarks. For god's sake, when you let base Qwen 2.5 32B non-Instruct generate random tokens with trash prompt it will often generate MMLU-style questions and answer pairs out of itself. It's trained to do well at benchmarks that they test on. More on reddit.com

r/LocalLLaMA

43

136

June 5, 2025

Videos

reddit.com

r/LocalLLaMA on Reddit: DeepSeek-R1-0528-Qwen3-8B on iPhone 16 Pro

December 25, 2024

17:15

YouTube

DeepSeek R1 0528 Qwen3 8B - Small Upgraded Student Model - Install ...

May 29, 2025

10:25

YouTube

Run DeepSeek-R1-0528-Qwen3-8B Locally with Gaia (Easy Tutorial!)

June 9, 2025

41:46

YouTube

DeepSeek R1 0528 : 8B vs 671B (Live Test) - YouTube

May 30, 2025

reddit.com

r/LocalLLaMA on Reddit: deepseek r1 0528 qwen 8b on android MNN chat

February 1, 2025

12:50

YouTube

New DeepSeek R1 is Really, Really Good Coder - YouTube

May 29, 2025

View all

OpenRouter

openrouter.ai › deepseek › deepseek-r1-0528-qwen3-8b:free

DeepSeek R1 0528 Qwen3 8B - API, Providers, Stats | OpenRouter

May 29, 2025 - The distilled variant, DeepSeek-R1-0528-Qwen3-8B, transfers this chain-of-thought into an 8 B-parameter form, beating standard Qwen3 8B by +10 pp and tying the 235 B “thinking” giant on AIME 2024.

LM Studio

lmstudio.ai › models › deepseek › deepseek-r1-0528-qwen3-8b

deepseek/deepseek-r1-0528-qwen3-8b

May 29, 2025 - This model achieves state-of-the-art (SOTA) performance among open-source models on the AIME 2024, surpassing Qwen3 8B by +10.0% and matching the performance of Qwen3-235B-thinking.

Artificial Analysis

artificialanalysis.ai › models › deepseek-r1-qwen3-8b

DeepSeek R1 0528 Qwen3 8B - Intelligence, Performance & Price Analysis

Analysis of DeepSeek's DeepSeek R1 0528 Qwen3 8B and comparison to other AI models across key metrics including quality, price, performance (tokens per second & time to first token), context window & more.

Ollama

ollama.com › library › deepseek-r1:8b

deepseek-r1:8b

DeepSeek-R1-0528-Qwen3-8B · ollama run deepseek-r1 · DeepSeek-R1 · ollama run deepseek-r1:671b · Note: to update the model from an older version, run ollama pull deepseek-r1 ·

DeepSeek

api-docs.deepseek.com › deepseek-r1-0528 release 2025/05/28

DeepSeek-R1-0528 Release | DeepSeek API Docs

🚀 DeepSeek-R1-0528 is here · 🔹 Improved benchmark performance

GitHub

github.com › deepseek-ai › DeepSeek-R1 › issues › 685

Performance issues of DeeSeek-R1-0528-Qwen3-8B in AIME2024 · Issue #685 · deepseek-ai/DeepSeek-R1

June 13, 2025 - We tested the performance of DeeSeek-R1-0528-Qwen3-8B on AIME2024. After hyperparameter search for inference, we found that the highest score is around 83, which is still about 3 points lower than the reported score of 86.0.

Published Jun 13, 2025

Find elsewhere

Google Bing Mojeek

Unsloth

unsloth.ai › blog › deepseek-r1-0528

How to Run Deepseek-R1-0528 Locally

DeepSeek's R1-0528 model is the most powerful open-source model. .Learn to run the model and Qwen3-8B distill with Unsloth 1.78-bit Dynamic quants.

Hugging Face

huggingface.co › deepseek-ai › DeepSeek-R1-0528-Qwen3-8B › discussions › 11

deepseek-ai/DeepSeek-R1-0528-Qwen3-8B · Tried it, but not good as expected.

May 30, 2025 - @SytanSD I had similar issues with the source Qwen3 8b model. It failed to answer simple questions that much smaller models like Llama 3.2 3b reliably got right, such as what's the third rock from the sun (Earth). So I suspect the primary issue is that DeepSeek used Qwen3, which so egregiously overfit to the standard LLM tests that they're riddled with pockets of profound ignorance, making them frustratingly unreliable across a spectrum of real-world tasks.

Hugging Face

huggingface.co › lmstudio-community › DeepSeek-R1-0528-Qwen3-8B-GGUF

lmstudio-community/DeepSeek-R1-0528-Qwen3-8B-GGUF · Hugging Face

Model creator: deepseek-ai Original model: DeepSeek-R1-0528-Qwen3-8B GGUF quantization: provided by bartowski based on llama.cpp release b5524 LM Studio Model Page: https://lmstudio.ai/models/deepseek/deepseek-r1-0528-qwen3-8b

Apidog

apidog.com › blog › deepseek-r1-0528-qwen-8b-local-ollama-lm-studio

Running DeepSeek R1 0528 Qwen 8B Locally: Complete Guide with Ollama and LM Studio

August 17, 2025 - Setting up DeepSeek R1 0528 in LM Studio involves navigating to the model catalog and searching for "DeepSeek R1 0528" or "Deepseek-r1-0528-qwen3-8b." The catalog displays various quantization options, allowing users to select the version that best matches their hardware capabilities.

Clarifai

clarifai.com › deepseek-ai › deepseek-chat › models › DeepSeek-R1-0528-Qwen3-8B

DeepSeek-R1-0528-Qwen3-8B model | Clarifai - The World's AI

DeepSeek-R1-0528 improves reasoning and logic via better computation and optimization, nearing the performance of top models like O3 and Gemini 2.5 Pro.

Artificial Analysis

artificialanalysis.ai › models › comparisons › deepseek-r1-vs-qwen3-8b-instruct

DeepSeek R1 0528 (May '25) vs Qwen3 8B (Non-reasoning): Model Comparison

Comparison between DeepSeek R1 0528 (May '25) and Qwen3 8B (Non-reasoning) across intelligence, price, speed, context window and more.

Read the Docs

inference.readthedocs.io › en › v1.8.0 › models › builtin › llm › deepseek-r1-0528-qwen3.html

deepseek-r1-0528-qwen3 — Xinference

Model ID: QuantTrio/DeepSeek-R1-0528-Qwen3-8B-{quantization} Model Hubs: Hugging Face, ModelScope ·

Featherless

featherless.ai › models › deepseek-ai › DeepSeek-R1-0528-Qwen3-8B

deepseek-ai/DeepSeek-R1-0528-Qwen3-8B - Featherless.ai

--- license: mit library_name: transformers --- # DeepSeek-R1-0528

Galaxy

blog.galaxy.ai › compare › deepseek-r1-0528-qwen3-8b-vs-phi-3-5-mini-128k-instruct

DeepSeek R1 0528 Qwen3 8B vs Phi-3.5 Mini 128K Instruct (Comparative Analysis) | Galaxy.ai

DeepSeek R1 0528 Qwen3 8B by DeepSeek offers advanced reasoning, generates structured data. It can handle standard conversations with its 32.8K token context window. Very affordable at $0.02/M input and $0.10/M output tokens.

Routstr

routstr.com › models › deepseek › deepseek-r1-0528-qwen3-8b

Deepseek R1 0528 Qwen3 8B

May 29, 2025 - The future of AI access is permissionless, private, and decentralized

Simon Willison

simonwillison.net › 2025 › May › 31 › deepseek-aideepseek-r1-0528

deepseek-ai/DeepSeek-R1-0528

May 31, 2025 - The new R1 comes in two sizes: a 685B model called deepseek-ai/DeepSeek-R1-0528 (the previous R1 was 671B) and an 8B variant distilled from Qwen 3 called deepseek-ai/DeepSeek-R1-0528-Qwen3-8B.

Novita AI

novita.ai › models › model-detail › deepseek-deepseek-r1-0528-qwen3-8b

DeepSeek R1 0528 Qwen3 8B API & Playground | Novita AI

DeepSeek-R1-0528-Qwen3-8B is a high-performance reasoning model based on the Qwen3 8B Base model, enhanced through the integration of DeepSeek-R1-0528's Chain-of-Thought (CoT) optimization.