Brave Search

1 month ago - For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.

reddit.com › r/localllama › deepseek-r1-0528 official benchmarks released!!!

r/LocalLLaMA on Reddit: DeepSeek-R1-0528 Official Benchmarks Released!!!

May 29, 2025 - AIME 2024, surpassing Qwen3 8B by +10.0% and matching the performance of Qwen3-235B-thinking. We believe that the chain-of-thought from DeepSeek-R1-0528 will hold significant importance for both academic research on reasoning models and industrial ...

Discussions

DeepSeek Deep Dive R1 at Home! - #422 by ubergarm - Machine Learning, LLMs, & AI - Level1Techs Forums

So I finally have a moment this morning after drinking some coffee to look at the fine print of that intel sglang benchmark results. To keep it simple let’s look only at the DeepSeek-R1-671B INT8 quant report: MODEL DATA TYPE SOCKETS llama.cpp TTFT (ms) llama.cpp TPOT (ms) SGLang TTFT (ms) ... More on forum.level1techs.com

forum.level1techs.com

July 30, 2025

DeepSeek-R1-0528 Official Benchmark

And they called it a 'minor improvement' More on reddit.com

r/LocalLLaMA

390

April 6, 2025

Deepseek R1 (Ollama) Hardware benchmark for LocalLLM : LocalLLaMA

Deepseek R1 was released and looks like one of the best models for local LLM. I tested it on some GPUs to see how many tps it can achieve. Tests... More on old.reddit.com

r/LocalLLaMA

Deepseek R1 is the only one that nails this new viral benchmark

Zero context here, what is this benchmark about? More on reddit.com

r/LocalLLaMA

106

443

August 11, 2024

Videos

06:04

YouTube

DeepSeek R1 0528 in 6 Minutes - YouTube

May 29, 2025

15:10

YouTube

DeepSeek R1 Fully Tested - Insane Performance - YouTube

January 22, 2025

youtube.com

DeepSeek R1 BLOWS AWAY The Competition - How Did ...

10:10

YouTube

DeepSeek R1 just got a HUGE Update! (o3 Level Model) - YouTube

May 30, 2025

12:50

YouTube

New DeepSeek R1 is Really, Really Good Coder - YouTube

May 29, 2025

09:09

YouTube

Deepseek-R1-0528: BEST Opensource Reasoning Model! Powerful, Fast, ...

May 28, 2025

View all

Medium

medium.com › @leucopsis › deepseeks-new-r1-0528-performance-analysis-and-benchmark-comparisons-6440eac858d6

DeepSeek’s New R1–0528: Performance Analysis and Benchmark Comparisons | by Barnacle Goose | Medium

May 30, 2025 - Improvements are also dramatic in coding benchmarks. On the Codeforces programming challenge, R1–0528’s rating is about 1930, up from ~1530 before- a ~400-point leap that reflects far better code generation and problem-solving.

Prompt Hub

prompthub.us › blog › deepseek-r-1-model-overview-and-how-it-ranks-against-openais-o1

PromptHub Blog: DeepSeek R-1 Model Overview and How it Ranks Against OpenAI's o1

... Temperature: 0.6. Top-p value: 0.95. Pass@1 estimation: Generated 64 responses per query. ... DeepSeek R1 outperformed o1, Claude 3.5 Sonnet and other models in the majority of reasoning benchmarks

Level1Techs

forum.level1techs.com › high-performance computing › machine learning, llms, & ai

DeepSeek Deep Dive R1 at Home! - #422 by ubergarm - Machine Learning, LLMs, & AI - Level1Techs Forums

July 30, 2025 - Ahh thanks, that one is easy enough then even if TTFT is not so useful without knowledge of the prompt/context size. So I finally have a moment this morning after drinking some coffee to look at the fine print of that intel sglang benchmark results. To keep it simple let’s look only at the DeepSeek-R1-671B INT8 quant report: MODEL DATA TYPE SOCKETS llama.cpp TTFT (ms) llama.cpp TPOT (ms) SGLang TTFT (ms) SGLang TPOT (ms) Speedup TTFT Speedup TPOT DeepSeek-R1-671B INT8 2 24546.76 172.0...

arXiv

arxiv.org › html › 2501.12948v1

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

October 13, 2025 - For engineering-related tasks, ... achieves outstanding results, significantly outperforming DeepSeek-V3 with scores of 90.8% on MMLU, 84.0% on MMLU-Pro, and 71.5% on GPQA Diamond....

DataCamp

datacamp.com › blog › deepseek-r1

DeepSeek-R1: Features, o1 Comparison, Distilled Models & More | DataCamp

June 4, 2025 - DeepSeek-R1 performs strongly with a score of 49.2%, slightly ahead of OpenAI o1-1217’s 48.9%. This result positions DeepSeek-R1 as a strong contender in specialized reasoning tasks like software verification.

Fireworks AI

fireworks.ai › blog › deepseek-r1-deepdive

DeepSeek-R1 Overview: Features, Capabilities, Parameters

Reasoning Tasks: Shows performance on par with OpenAI’s o1 model across complex reasoning benchmarks. Image source: DeepSeek R1 Research Paper (Modified)

Find elsewhere

Google Bing Mojeek

BentoML

bentoml.com › blog › the-complete-guide-to-deepseek-models-from-v3-to-r1-and-beyond

The Complete Guide to DeepSeek Models: V3, R1, V3.1, V3.2 and Beyond

Image Source: DeepSeek-R1 Supplementary Information · Performance-wise, R1 rivals or even surpasses OpenAI o1 (also a reasoning model, but does not fully disclose the thinking tokens as R1) in math, coding, and reasoning benchmarks.

Towards Data Science

towardsdatascience.com › home › latest › how to benchmark deepseek-r1 distilled models on gpqa using ollama and openai’s simple-evals

How to Benchmark DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI’s simple-evals | Towards Data Science

April 24, 2025 - Set up and run the GPQA-Diamond benchmark on DeepSeek-R1's distilled models locally to evaluate its reasoning capabilities.

Analytics Vidhya

analyticsvidhya.com › home › deepseek r1 vs openai o1: which one is faster, cheaper and smarter?

DeepSeek R1 vs OpenAI o1: Which One is Faster, Cheaper and Smarter?

April 4, 2025 - Benchmark Excellence: R1 matches OpenAI o1 in key tasks, with some areas of clear outperformance. While DeepSeek R1 builds upon the collective work of open-source research, its efficiency and performance demonstrate how creativity and strategic ...

TIME

time.com › tech › best inventions 2025 › deepseek r1: the best inventions of 2025

DeepSeek R1: The Best Inventions of 2025 | TIME

October 9, 2025 - The abrupt appearance of DeepSeek’s R1 advanced reasoning model at the start of the year was akin to the “shot heard ‘round the world” in AI circles. Major tech companies had spent recent years pouring billions into generative AI projects, products, and infrastructure. Meanwhile, Chinese startup DeepSeek created in just months a model as good as OpenAI’s then-most advanced product on industry-standard benchmarks...

Understanding AI

understandingai.org › p › the-best-chinese-open-weight-models

The best Chinese open-weight models — and the strongest US rivals

1 week ago - These models have impressive benchmark numbers: Artificial Analysis rates V3.2 as the second best open model on their index, while V3.2 Speciale tops all models — open or closed — in the MathArena benchmark for final answer competitions.

Artificial Analysis

artificialanalysis.ai › models › deepseek-r1

DeepSeek R1 0528 - Intelligence, Performance & Price Analysis

Analysis of DeepSeek's DeepSeek R1 0528 (May '25) and comparison to other AI models across key metrics including quality, price, performance (tokens per second & time to first token), context window & more.

GitHub

github.com › deepseek-ai › DeepSeek-R1

GitHub - deepseek-ai/DeepSeek-R1

To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, ...

Starred by 91.6K users

Forked by 11.8K users

Science News

sciencenews.org › article › ai-model-deepseek-answers-training

A look under the hood of DeepSeek’s AI models doesn't provide all the answers

2 weeks ago - On the benchmark math and code problems, DeepSeek-R1-Zero performed better than the humans selected for the benchmark study, but the model still had issues. Being trained on both English and Chinese data, for example, led to outputs that mixed ...

TextCortex

textcortex.com › home › blog posts › deepseek r1 review: performance in benchmarks & evals

DeepSeek R1 Review: Performance in Benchmarks & Evals

January 26, 2025 - According to the same benchmark, ... and coding performance, it has a score of 96.3 in the Codeforce benchmark, 71.5 in the GPQA-diamond benchmark, and 97.3 in the MATH-500 benchmark....

GeekyAnts

geekyants.com › blog › deepseek-r1-vs-openais-o1-the-open-source-disruptor-raising-the-bar

DeepSeek-R1 vs. OpenAI’s o1: The Open-Source Disruptor Raising the Bar - GeekyAnts

January 26, 2025 - What’s jaw-dropping is that DeepSeek-R1 not only talks the talk with transparency, but it also walks the walk in terms of performance. DeepSeek-R1 surpasses OpenAI’s o1 in critical benchmarks—including the math-heavy AIME, the MATH-500 dataset, and coding challenges on Codeforces.

Eqbench

eqbench.com › creative_writing.html

EQ-Bench Creative Writing v3 Leaderboard

Emotional Intelligence Benchmarks for LLMs · Github | Paper | | Twitter | About

reddit.com › r/localllama › deepseek-r1-0528 official benchmark

r/LocalLLaMA on Reddit: DeepSeek-R1-0528 Official Benchmark

April 6, 2025 -

Source：https://mp.weixin.qq.com/s/U5fnTRW4cGvXYJER__YBiw