Hugging Face
huggingface.co › deepseek-ai › DeepSeek-V3.1
deepseek-ai/DeepSeek-V3.1 · Hugging Face
3 weeks ago - Search agents are evaluated with our internal search framework, which uses a commercial search API + webpage filter + 128K context window. Seach agent results of R1-0528 are evaluated with a pre-defined workflow. SWE-bench is evaluated with our internal code agent framework. HLE is evaluated with the text-only subset. import transformers tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3.1") messages = [ {"role": "system", "content": "You are a helpful assistant"}, {"role": "user", "content": "Who are you?"}, {"role": "assistant", "content": "<think>Hmm</think>I am
DeepSeek v3.1
Qwen: Deepseek must have concluded that hybrid models are worse. Deepseek: Qwen must have cnocluded that hybrid models are better. More on reddit.com
So.. What's the consensus on Deepseek-V3.1 for RP?
Agreed. 3.1 feels “technically correct” but flat like it stripped all the flavor out of the RP. R1 wasn’t perfect but it had way more life and descriptive flair More on reddit.com
DeepSeek V3.1 (Thinking) aggregated benchmarks (vs. gpt-oss-120b)
That proves that benchmarks are barely usefull now. More on reddit.com
deepseek-ai/DeepSeek-V3.1 · Hugging Face
OK, so here are my quick takes on DeepSeek V3.1. Improving agentic capability seems to be the focus of this update. More specifically: 29.8% on HLE with search and Python, compared to 24.8% for R1-0528, 35.2% for GPT-5 Thinking, 24.3% for o3, 38.6% for Grok 4, and 26.9% for Gemini Deep Research. Caveats apply: DeepSeek models are exclusively evaluated on text subset, although I believe this subset is not easier for SotA models. Grok 4 is (possibly) evaluated without a webpage filter so data contamination is possible. 66.0% on SWE-Bench Verified without Thinking, compared to 44.6% for R1-0528, 74.9% for GPT-5 Thinking, 69.1% for o3, 74.5% for Claude 4.1 Opus, and 65.8 for Kimi K2. Again, caveats apply: OpenAI models are evaluated on a subset of 477 problems, not the 500 full set. 31.3% on Terminal Bench with Terminus 1 framework, compared to 30.2% for o3, 30.0% for GPT-5, and 25.3% for Gemini 2.5 Pro. A slight bump on other coding and math capabilities (AIME, LiveCodeBench, Codeforces, Aider) but most users would not be able to tell the difference, as R1-0528 already destroys 98% of human programmers on competitive programming. A slight reduction on GPQA, HLE (offline, no tools), and maybe in your own use case. I do not find V3.1 Thinking to be better than R1-0528 as a Chat LLM, for example. A few concluding thoughts: Right now I am actually more worried about how the open-source ecosystem will be deploying DeepSeek V3.1 in an agentic environment more than anything else. For agentic LLMs, prompts and agent frameworks make a huge difference in user experience. Gemini, Anthropic, and OpenAI all have branded search and code agents (e.g. Deep Research, Claude Code), but DeepSeek has none. So it remains to be seen how well V3.1 can work with prompts and tools from Claude Code, for example. Maybe DeepSeek will open-source their internal search and coding framework in a future date to ensure the best user experience. I also noticed a lot of serverless LLM inference providers cheap out on their deployment. They may serve with lowered precision, pruned experts, or poor sampling parameters. So the provider you use will definitely impact your user experience. It also starts to make sense why they merged the R1 with V3 and made 128K context window the default on the API. Agentic coding usually does not benefit much from a long CoT but consume a ton of tokens. So a singular model is a good way to reduce deployment TCO. This is probably as far as they can push on the V3 base - you can already see some regression on things like GPQA, offline HLE. Hope to see V4 soon. More on reddit.com
Videos
Reddit
reddit.com › r/localllama › deepseek v3.1
r/LocalLLaMA on Reddit: DeepSeek v3.1
August 19, 2025 -
It’s happening!
DeepSeek online model version has been updated to V3.1, context length extended to 128k, welcome to test on the official site and app. API calling remains the same.
Top answer 1 of 5
121
Qwen: Deepseek must have concluded that hybrid models are worse. Deepseek: Qwen must have cnocluded that hybrid models are better.
2 of 5
69
More observation:
1. The model is very very verbose.
2. The “r1” in the think button has gone, indicating this is a mixed reasoning model! Well we’ll know when the official blog is out.
Together AI
together.ai › models › deepseek-v3-1
DeepSeek-V3.1 API | Together AI
Guanacos are one of two wild South American camelids; the other species is the vicuña, which lives at higher elevations.", ] response = client.rerank.create( model="deepseek-ai/DeepSeek-V3.1", query=query, documents=documents, top_n=2 ) for result in response.results: print(f"Relevance Score: {result.relevance_score}")
DeepSeek
api-docs.deepseek.com › deepseek v3.1 update 2025/09/22
DeepSeek-V3.1-Terminus | DeepSeek API Docs
September 22, 2025 - The latest update builds on V3.1’s strengths while addressing key user feedback. ... 📊 DeepSeek-V3.1-Terminus delivers more stable & reliable outputs across benchmarks compared to the previous version.
YouTube
youtube.com › watch
DeepSeek V3.1: Bigger Than You Think! - YouTube
DeepSeek V3.1 is a unified hybrid reasoning open-weight model that powers agentic workflows—FP8 training, strong post-training for tool/function calling (non...
Published August 22, 2025
Google Cloud Platform
console.cloud.google.com › vertex-ai › publishers › deepseek-ai › model-garden › deepseek-v3-1
DeepSeek-V3.1 – Vertex AI
Google Cloud Console has failed to load JavaScript sources from www.gstatic.com. Possible reasons are:www.gstatic.com or its IP addresses are blocked by your network administratorGoogle has temporarily blocked your account or network due to excessive automated requestsPlease contact your network ...
Hugging Face
huggingface.co › deepseek-ai › DeepSeek-V3.1-Base
deepseek-ai/DeepSeek-V3.1-Base · Hugging Face
3 weeks ago - Search agents are evaluated with our internal search framework, which uses a commercial search API + webpage filter + 128K context window. Seach agent results of R1-0528 are evaluated with a pre-defined workflow. SWE-bench is evaluated with our internal code agent framework. HLE is evaluated with the text-only subset. import transformers tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3.1") messages = [ {"role": "system", "content": "You are a helpful assistant"}, {"role": "user", "content": "Who are you?"}, {"role": "assistant", "content": "<think>Hmm</think>I am