Brave Search

For all our models, the maximum generation length is set to 32,768 tokens. For benchmarks requiring sampling, we use a temperature of $0.6$, a top-p value of $0.95$, and generate 64 responses per query to estimate pass@1.

GitHub

github.com › deepseek-ai › DeepSeek-R1

GitHub - deepseek-ai/DeepSeek-R1

To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, ...

Starred by 91.6K users

Forked by 11.8K users

Videos

06:04

YouTube

DeepSeek R1 0528 in 6 Minutes - YouTube

May 29, 2025

15:10

YouTube

DeepSeek R1 Fully Tested - Insane Performance - YouTube

January 22, 2025

youtube.com

DeepSeek R1 BLOWS AWAY The Competition - How Did ...

27:54

YouTube

OpenAI’s O3 Mini vs. DeepSeek R1: Which One Wins? - YouTube

February 2, 2025

05:58

YouTube

RTX 5090 vs 4090 for AI inference | Testing deepseek r1 performance ...

February 1, 2025

View all

People also ask

Will TextCortex use our data to train AI models?

Absolutely not. None of your data will ever be used for model training or fine-tuning.

textcortex.com

textcortex.com › home › blog posts › deepseek r1 review: performance in benchmarks & evals

DeepSeek R1 Review: Performance in Benchmarks & Evals

Is TextCortex AI free?

Yes, TextCortex is completely free to use with many of its core features. When you sign up, you receive 100 free creations. Then you will receive 20 recurring creations every day on the free plan.

textcortex.com

textcortex.com › home › blog posts › deepseek r1 review: performance in benchmarks & evals

DeepSeek R1 Review: Performance in Benchmarks & Evals

Does TextCortex offer Text Generation API?

Yes, we have a Text Generation API, please talk to us directly to implement it. You can reach out to us at [email protected]

textcortex.com

textcortex.com › home › blog posts › deepseek r1 review: performance in benchmarks & evals

DeepSeek R1 Review: Performance in Benchmarks & Evals

arXiv

arxiv.org › html › 2501.12948v1

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

October 13, 2025 - For engineering-related tasks, ... achieves outstanding results, significantly outperforming DeepSeek-V3 with scores of 90.8% on MMLU, 84.0% on MMLU-Pro, and 71.5% on GPQA Diamond....

Prompt Hub

prompthub.us › blog › deepseek-r-1-model-overview-and-how-it-ranks-against-openais-o1

PromptHub Blog: DeepSeek R-1 Model Overview and How it Ranks Against OpenAI's o1

... Temperature: 0.6. Top-p value: 0.95. Pass@1 estimation: Generated 64 responses per query. ... DeepSeek R1 outperformed o1, Claude 3.5 Sonnet and other models in the majority of reasoning benchmarks

Medium

medium.com › @leucopsis › deepseeks-new-r1-0528-performance-analysis-and-benchmark-comparisons-6440eac858d6

DeepSeek’s New R1–0528: Performance Analysis and Benchmark Comparisons | by Barnacle Goose | Medium

May 30, 2025 - Improvements are also dramatic in coding benchmarks. On the Codeforces programming challenge, R1–0528’s rating is about 1930, up from ~1530 before- a ~400-point leap that reflects far better code generation and problem-solving.

DataCamp

datacamp.com › blog › deepseek-r1

DeepSeek-R1: Features, o1 Comparison, Distilled Models & More | DataCamp

June 4, 2025 - DeepSeek-R1 performs strongly with a score of 49.2%, slightly ahead of OpenAI o1-1217’s 48.9%. This result positions DeepSeek-R1 as a strong contender in specialized reasoning tasks like software verification.

TextCortex

textcortex.com › home › blog posts › deepseek r1 review: performance in benchmarks & evals

DeepSeek R1 Review: Performance in Benchmarks & Evals

According to the same benchmark, ... and coding performance, it has a score of 96.3 in the Codeforce benchmark, 71.5 in the GPQA-diamond benchmark, and 97.3 in the MATH-500 benchmark....

Artificial Analysis

artificialanalysis.ai › models › deepseek-r1

DeepSeek R1 0528 - Intelligence, Performance & Price Analysis

Analysis of DeepSeek's DeepSeek R1 0528 (May '25) and comparison to other AI models across key metrics including quality, price, performance (tokens per second & time to first token), context window & more.

Find elsewhere

Google Bing Mojeek

Fireworks AI

fireworks.ai › blog › deepseek-r1-deepdive

DeepSeek-R1 Overview: Features, Capabilities, Parameters

Reasoning Tasks: Shows performance on par with OpenAI’s o1 model across complex reasoning benchmarks. Image source: DeepSeek R1 Research Paper (Modified)

GitHub

github.com › paradite › deepseek-r1-speed-benchmark

GitHub - paradite/deepseek-r1-speed-benchmark: Code for benchmarking the speed of DeepSeek R1 from different providers' APIs.

DeepSeek : Speed: 24.34 tokens/s, Total: 424 tokens, Prompt: 12 tokens, Completion: 412 tokens, Time: 16.93s, Latency: 1.11s, Length: 1904 chars Fireworks : Speed: 21.41 tokens/s, Total: 373 tokens, Prompt: 10 tokens, Completion: 363 tokens, ...

Author paradite

Vals AI

vals.ai › models › fireworks_deepseek-r1

DeepSeek R1

Accuracy · 69.64% Latency · 77.95s · Cost (In/Out) 1.35 / 5.4 · Context Window · 164k · Max Output Tokens · 164k · Input Modality · Benchmarks · Accuracy · Rankings · CaseLaw (v2) 0.0% 57/ 57 ·

reddit.com › r/localllama › deepseek-r1-0528 official benchmarks released!!!

r/LocalLLaMA on Reddit: DeepSeek-R1-0528 Official Benchmarks Released!!!

May 29, 2025 - AIME 2024, surpassing Qwen3 8B by +10.0% and matching the performance of Qwen3-235B-thinking. We believe that the chain-of-thought from DeepSeek-R1-0528 will hold significant importance for both academic research on reasoning models and industrial ...

Hugging Face

huggingface.co › deepseek-ai › DeepSeek-R1-0528

deepseek-ai/DeepSeek-R1-0528 · Hugging Face

This model achieves state-of-the-art (SOTA) performance among open-source models on the AIME 2024, surpassing Qwen3 8B by +10.0% and matching the performance of Qwen3-235B-thinking.

Towards Data Science

towardsdatascience.com › home › latest › how to benchmark deepseek-r1 distilled models on gpqa using ollama and openai’s simple-evals

How to Benchmark DeepSeek-R1 Distilled Models on GPQA Using Ollama and OpenAI’s simple-evals | Towards Data Science

April 24, 2025 - Set up and run the GPQA-Diamond benchmark on DeepSeek-R1's distilled models locally to evaluate its reasoning capabilities.

ARC Prize

arcprize.org › blog › r1-zero-r1-results-analysis

An Analysis of DeepSeek's R1-Zero and R1

Last week, DeepSeek published their new R1-Zero and R1 “reasoner” systems that is competitive with OpenAI’s o1 system on ARC-AGI-1. R1-Zero, R1, and o1 (low compute) all score around 15-20% – in contrast to GPT-4o’s 5%, the pinnacle ...

Ollama

ollama.com › library › deepseek-r1:8b

deepseek-r1:8b

In this update, DeepSeek R1 has significantly improved its reasoning and inference capabilities. The model has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic.

DeepSeek

api-docs.deepseek.com › deepseek-r1-lite release 2024/11/20

🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power! | DeepSeek API Docs

Longer Reasoning, Better Performance. DeepSeek-R1-Lite-Preview shows steady score improvements on AIME as thought length increases.

Ollama

ollama.com › library › deepseek-r1

deepseek-r1

Analytics Vidhya

analyticsvidhya.com › home › deepseek r1 vs openai o1: which one is faster, cheaper and smarter?

DeepSeek R1 vs OpenAI o1: Which One is Faster, Cheaper and Smarter?

April 4, 2025 - Benchmark Excellence: R1 matches OpenAI o1 in key tasks, with some areas of clear outperformance. While DeepSeek R1 builds upon the collective work of open-source research, its efficiency and performance demonstrate how creativity and strategic ...

PromptLayer

blog.promptlayer.com › openai-o3-vs-deepseek-r1-an-analysis-of-reasoning-models

OpenAI o3 vs DeepSeek r1: Which Reasoning Model is Best?

January 28, 2025 - SWE-bench Verified Benchmark: O3 scored 71.7%, surpassing DeepSeek R1 (49.2%) and OpenAI's O1 (48.9%).