gpt-5.1 vs claude 4.5 sonnet

reddit.com › r/claudeai › i tested gpt-5.1 codex against sonnet 4.5, and it's about time anthropic bros take pricing seriously.

r/ClaudeAI on Reddit: I tested GPT-5.1 Codex against Sonnet 4.5, and it's about time Anthropic bros take pricing seriously.

November 15, 2025 -

I've used Claude Sonnets the most among LLMs, for the simple reason that they are so good at prompt-following and an absolute beast at tool execution. That also partly explains the maximum Anthropic revenue from APIs (code agents to be precise). They have an insane first-mover advantage, and developers love to die for.

But GPT 5.1 codex has been insanely good. One of the first things I do when a new promising model drops is to run small tests to decide which models to stick with until the next significant drop. Also, allows dogfooding our product while building these.

I did a quick competition among Claude 4.5 Sonnet, GPT 5, 5.1 Codex, and Kimi k2 thinking.

Test 1 involved building a system that learns baseline error rates, uses z-scores and moving averages, catches rate-of-change spikes, and handles 100k+ logs/minute with under 10ms latency.
Test 2 involved fixing race conditions when multiple processors detect the same anomaly. Handle ≤3s clock skew and processor crashes. Prevent duplicate alerts when processors fire within 5 seconds of each other.

The setup used models with their own CLI agent inside Cursor,

Claude Code with Sonnet 4.5
GPT 5 and 5.1 Codex with Codex CLI
Kimi K2 Thinking with Kimi CLI

Here's what I found out:

Test 1 - Advanced Anomaly Detection: Both GPT-5 and GPT-5.1 Codex shipped working code. Claude and Kimi both had critical bugs that would crash in production. GPT-5.1 improved on GPT-5's architecture and was faster (11m vs 18m).
Test 2 - Distributed Alert Deduplication: Codexes won again with actual integration. Claude had solid architecture, but didn't wire it up. Kimi had good ideas, but a broken duplicate-detection logic.

Codex cost me $0.95 total (GPT-5) vs Claude's $1.68. That's 43% cheaper for code that actually works. GPT-5.1 was even more efficient at $0.76 total ($0.39 for test 1, $0.37 for test 2).

I have written down a complete comparison picture for this. Check it out here: Codexes vs Sonnet vs Kimi

And, honestly, I can see the simillar performance delta in other tasks as well. Though for many quick tasks I still use Haiku, and Opus for hardcore reasoning, but GPT-5 variants have become great workhorses.

OpenAI is certainly after that juicy Anthropic enterprise margins, and Anthropic really needs to rethink its pricing.

Would love to know your experience with GPT 5.1 and how you rate it against Claude 4.5 Sonnet.

Videos

36:14

YouTube

Gemini 3 Pro vs GPT-5.1 vs Grok 4.1 vs Sonnet 4.5 – The ULTIMATE ...

November 19, 2025

23:39

YouTube

Gpt 5.1-High(Thinking) vs Claude. 4.5 Sonnet ! Who will win? - YouTube

November 14, 2025

11:13

YouTube

Claude Sonnet 4.5 - The New Coding King? (Sonnet 4.5 vs. GPT 5 ...

September 29, 2025

reddit.com

r/ClaudeAI on Reddit: Claude Opus 4.5 vs Gemini 3 Pro Preview

October 25, 2025

reddit.com

r/cursor on Reddit: Gemini 3 is disappoint GPT 5.1 and sonnet 4.5 ...

November 22, 2025

reddit.com

r/singularity on Reddit: Comparing Sonnet 4.5 and GPT-5 Pro for ...

August 9, 2025

View all

CometAPI

cometapi.com › gpt-5-1-vs-claude-sonnet-4-5

GPT-5.1 vs Claude Sonnet 4.5 — Which one leads the frontier in 2025? - CometAPI - All AI Models in One API

December 2, 2025 - OpenAI and early partners report that GPT-5.1 outperforms GPT-5 on a variety of code and reasoning suites, and runs 2–3× faster than GPT-5 in some tool-heavy contexts while using fewer tokens for many tasks.

Composio

composio.dev › blog › kimi-k2-thinking-vs-claude-4-5-sonnet-vs-gpt-5-codex-tested-the-best-models-for-agentic-coding

GPT-5.1 Codex vs. Claude 4.5 Sonnet vs. Kimi K2 Thinking : Tested the best models for agentic coding - Composio

November 13, 2025 - Claude designs better but doesn't integrate. Kimi has clever ideas but introduces showstoppers. For real-world development where you need working code fast, Codex is the practical choice, and GPT-5.1 is the evolution that makes it even better.

Tom's Guide

tomsguide.com › ai

ChatGPT-5.1 vs Claude 4.5 Sonnet — I ran 9 tests to find the most creative assistant | Tom's Guide

November 13, 2025 - ChatGPT-5.1 presented a clever, ... memories into portals. Claude 4.5 Sonnet crafted an emotionally resonating scene by establishing immediate mystery with a specific, impossible message from the dead....

Clarifai

clarifai.com › home › gemini 3.0 vs gpt-5.1 vs claude 4.5 vs grok 4.1: ai model comparison

Gemini 3.0 vs GPT-5.1 vs Claude 4.5 vs Grok 4.1: AI Model Comparison

3 weeks ago - GPT‑5.1 balances cost and capability—its Instant mode creates engaging dialogues and its patching tools ensure safe code modifications, making it a practical choice for many developers.

reddit.com › r/openai › gpt-5.1 impressions: better clarity but limited problem-solving gains

r/OpenAI on Reddit: GPT-5.1 impressions: better clarity but limited problem-solving gains

November 13, 2025 -

I've been using GPT-5.1 for a bit and noticed some improvements in how it frames answers. It seems more comfortable explaining things in a way that's easier to understand. Despite that, I still find its ability to express itself falls short compared to models like Claude or Google's Gemini.

When it comes to solving problems, I haven't noticed any real improvement. I tried a few algorithm questions and the issues that GPT-5 couldn't handle remain unresolved in 5.1.

In short, this may be a significant upgrade for some users, but in my area of work it hasn't felt like a major change.

Top answer

1 of 4

I still find its ability to express itself falls short compared to models like Claude or Google's Gemini. This is just pure bait. Gemini has the worst sycophantic personality I have seen among any models (that includes 4o). It tries to just agree with me on everything even when I am wrong. Claude is a little better but still bad. The only model that didn't glaze me at all was GPT-5 reasoning (and Grok 4 as well). GPT-5.1 has the best instruction following capacity I have seen among the frontier models, it really follows custom instructions, I can't wait to get it in codex.

2 of 4

Did you use gpt-5.1 thinking for complex questions?

Medium

medium.com › @paulhoke › comparing-ai-models-gpt-5-1-gpt-5-gpt-4-1-claude-sonnet-4-5-and-claude-haiku-4-5-4d5a9e6561da

Comparing AI Models: GPT-5.1, GPT-5, GPT-4.1, Claude Sonnet 4.5, and Claude Haiku 4.5 | by Paul Hoke | Nov, 2025 | Medium

November 14, 2025 - GPT-5.1 excels in conversational applications with superior hallucination reduction and instruction following, representing the latest in OpenAI’s evolution. Claude Sonnet 4.5 dominates complex reasoning, coding, and large-scale document ...

Glbgpt

glbgpt.com › hub › gpt51-vs-claude-sonnet-45

GPT‑5.1 vs Claude Sonnet 4.5: Deep Test in Writing, Coding, and Automation - The Surprising Winner Revealed

November 14, 2025 - Gemini 2.5 Pro judged GPT‑5.1’s as technical documentation and Claude’s as popular science. Both had merit, but Claude nailed word count and audience targeting. This test genuinely surprised me.

Find elsewhere

Google Bing Mojeek

TechRadar

techradar.com › ai platforms & assistants

I tested Gemini 3, ChatGPT 5.1, and Claude Sonnet 4.5 – and Gemini crushed it in a real coding task | TechRadar

November 18, 2025 - Claude, in particular, impressed me with its prompt-driven coding skills, what many are now calling "Vibe Coding," where instead of writing code, you just tell the AI what you want – vibing with the AI results – nudging it along with subsequent ...

Binary Verse AI

binaryverseai.com › home › ai models & platforms › gpt-5.1 vs sonnet 4.5: a developer’s decision playbook for the ai coding debate

GPT-5.1 Vs Sonnet 4.5: 5 Proven 2025 Wins For Serious Devs

10:08

GPT-5.1 is cheaper per token, excellent at everyday coding, and solid on full repo and terminal benchmarks. It is a strong default for most dev teams. Claude Sonnet 4.5 is more expensive but leads on SWE-bench and Terminal-Bench style work.

Published November 16, 2025

Data Studios

datastudios.org › post › claude-opus-4-5-vs-chatgpt-5-1-full-report-and-comparison-of-models-features-performance-pricin

Claude Opus 4.5 vs. ChatGPT 5.1: Full Report and Comparison of Models, Features, Performance, Pricing and more

November 25, 2025 - Claude’s family (specifically the Claude Sonnet 4.5 model, which is a sibling to Opus 4.5) showed a huge leap here, going from ~40% on the older version to over 60% success on OSWorld tasks. Competing models (including GPT-5.1) were still under 40% on these tasks.

Cursor IDE

cursor-ide.com › blog › gpt-51-vs-claude-45

GPT-5/5.1 vs Claude Sonnet 4.5: Complete 2025 Comparison Guide - Cursor IDE 博客

November 13, 2025 - The maturity difference manifests primarily in third-party tooling availability. GPT-5 currently integrates with more IDE plugins and workflow automation tools, while Claude Sonnet 4.5 maintains stronger first-party support through Anthropic's developer platform and native implementations on ...

Hacker News

news.ycombinator.com › item

GPT-5.1 for Developers | Hacker News

November 17, 2025 - Claude 4.5 Sonnet definitely struggles with Swift 6.2 Concurrency semantics and has several times gotten itself stuck rather badly. Additionally Claude Code has developed a number of bugs, including rapidly re-scrolling the terminal buffer, pegging local CPU to 100%, and consuming vast amounts ...

Bind AI IDE

blog.getbind.co › 2025 › 11 › 19 › gemini-3-0-vs-gpt-5-1-vs-claude-sonnet-4-5-which-one-is-better

Gemini 3.0 vs GPT-5.1 vs Claude Sonnet 4.5: Which one is better? – Bind AI IDE

November 19, 2025 - Ideal for teams wanting quick ... Sonnet 4.5 (Try here) — Built for longer autonomous runs, deep agentic reliability and safety focus, strong at complex planning and stepwise bugfixing....

Composio

composio.dev › blog › claude-sonnet-4-5-vs-gpt-5-codex-best-model-for-agentic-coding

Claude Sonnet 4.5 vs. GPT-5 Codex: Best model for agentic coding - Composio

October 7, 2025 - Struggled more with lint fixes and schema edge cases in this project. GPT‑5 Codex + Codex: Strongest at iterative execution, refactoring, and debugging; reliably shipped a working recommendation pipeline with minimal lint errors.

Getpassionfruit

getpassionfruit.com › blog › gpt-5-1-vs-claude-4-5-sonnet-vs-gemini-3-pro-vs-deepseek-v3-2-the-definitive-2025-ai-model-comparison

GPT 5.1 vs Claude 4.5 vs Gemini 3: 2025 AI Comparison

1 month ago - Gemini 3 Pro leads overall reasoning benchmarks with an unprecedented 1501 LMArena Elo, becoming the first model to break the 1500 barrier, while Claude 4.5 Sonnet dominates real-world coding at 77.2% SWE-bench and DeepSeek-V3.2 delivers ...

Medium

medium.com › @kram254 › gpt-5-1-variants-vs-claude-sonnet-4-5-ce7a2268a9fc

GPT-5.1 Variants vs. Claude Sonnet 4.5 | by Emmanuel Mark Ndaliro | Nov, 2025 | Medium

November 15, 2025 - Claude Sonnet 4.5 is all about structured reasoning and language clarity. Its architecture elevates it for tasks that need precise language understanding. Key Stats: — Reasoning Skills: Excels in nuanced language tasks.

LLM Stats

llm-stats.com › models › compare › claude-sonnet-4-5-20250929-vs-gpt-5.1-instant-2025-11-12

Claude Sonnet 4.5 vs GPT-5.1 Instant

November 12, 2025 - In-depth Claude Sonnet 4.5 vs GPT-5.1 Instant comparison: Latest benchmarks, pricing, context window, performance metrics, and technical specifications in 2025.