🌐
Reddit
reddit.com › r/claudeai › i tested gpt-5.1 codex against sonnet 4.5, and it's about time anthropic bros take pricing seriously.
r/ClaudeAI on Reddit: I tested GPT-5.1 Codex against Sonnet 4.5, and it's about time Anthropic bros take pricing seriously.
November 15, 2025 -

I've used Claude Sonnets the most among LLMs, for the simple reason that they are so good at prompt-following and an absolute beast at tool execution. That also partly explains the maximum Anthropic revenue from APIs (code agents to be precise). They have an insane first-mover advantage, and developers love to die for.

But GPT 5.1 codex has been insanely good. One of the first things I do when a new promising model drops is to run small tests to decide which models to stick with until the next significant drop. Also, allows dogfooding our product while building these.

I did a quick competition among Claude 4.5 Sonnet, GPT 5, 5.1 Codex, and Kimi k2 thinking.

  • Test 1 involved building a system that learns baseline error rates, uses z-scores and moving averages, catches rate-of-change spikes, and handles 100k+ logs/minute with under 10ms latency.

  • Test 2 involved fixing race conditions when multiple processors detect the same anomaly. Handle ≤3s clock skew and processor crashes. Prevent duplicate alerts when processors fire within 5 seconds of each other.

The setup used models with their own CLI agent inside Cursor,

  • Claude Code with Sonnet 4.5

  • GPT 5 and 5.1 Codex with Codex CLI

  • Kimi K2 Thinking with Kimi CLI

Here's what I found out:

  • Test 1 - Advanced Anomaly Detection: Both GPT-5 and GPT-5.1 Codex shipped working code. Claude and Kimi both had critical bugs that would crash in production. GPT-5.1 improved on GPT-5's architecture and was faster (11m vs 18m).

  • Test 2 - Distributed Alert Deduplication: Codexes won again with actual integration. Claude had solid architecture, but didn't wire it up. Kimi had good ideas, but a broken duplicate-detection logic.

Codex cost me $0.95 total (GPT-5) vs Claude's $1.68. That's 43% cheaper for code that actually works. GPT-5.1 was even more efficient at $0.76 total ($0.39 for test 1, $0.37 for test 2).

I have written down a complete comparison picture for this. Check it out here: Codexes vs Sonnet vs Kimi

And, honestly, I can see the simillar performance delta in other tasks as well. Though for many quick tasks I still use Haiku, and Opus for hardcore reasoning, but GPT-5 variants have become great workhorses.

OpenAI is certainly after that juicy Anthropic enterprise margins, and Anthropic really needs to rethink its pricing.

Would love to know your experience with GPT 5.1 and how you rate it against Claude 4.5 Sonnet.

🌐
Tom's Guide
tomsguide.com › ai › chatgpt
GPT-5.1 vs Claude 4.5 Sonnet — I tested 7 personality modes on each to see which was more personable | Tom's Guide
November 18, 2025 - Two goliaths of the AI world, ChatGPT and Claude, have each made the argument that they have the more personable chatbot. On both the latest versions, GPT-5.1 and Claude 4.5 Sonnet, you have the ability to customize the chatbot to a personality of your choosing, best fitting how you actually use it.
Discussions

GPT-5.1 for Developers
Claude 4.5 Sonnet definitely struggles with Swift 6.2 Concurrency semantics and has several times gotten itself stuck rather badly. Additionally Claude Code has developed a number of bugs, including rapidly re-scrolling the terminal buffer, pegging local CPU to 100%, and consuming vast amounts ... More on news.ycombinator.com
🌐 news.ycombinator.com
29
112
November 17, 2025
GPT-5.1 impressions: better clarity but limited problem-solving gains
I still find its ability to express itself falls short compared to models like Claude or Google's Gemini. This is just pure bait. Gemini has the worst sycophantic personality I have seen among any models (that includes 4o). It tries to just agree with me on everything even when I am wrong. Claude is a little better but still bad. The only model that didn't glaze me at all was GPT-5 reasoning (and Grok 4 as well). GPT-5.1 has the best instruction following capacity I have seen among the frontier models, it really follows custom instructions, I can't wait to get it in codex. More on reddit.com
🌐 r/OpenAI
9
9
November 13, 2025
GPT-5 vs Sonnet 4.5 Reviews
I must say GPT-5 is much more reliable, why ? Because it has a way different philosophy on getting things to function properly, while Claude Sonnet's philosophy is just getting work done, even if it's not functional because of the shortcut it takes in hard-coding a dynamic value or mocking a test. More on reddit.com
🌐 r/AugmentCodeAI
15
6
September 30, 2025
Claude Sonnet 4.5 vs GPT 5.1
It pretty good, but it depends the language, what you do like writing or debugging, etc. I don’t see them as competing models, I use them both and they fit differently in my workflow. I find Sonnet is still the best at writing code and planning. GPT-5 is really good at reviewing code, debugging. I consider that Sonnet is more talented while Gpt-5 is smarter if that make senses More on reddit.com
🌐 r/GithubCopilot
8
5
November 18, 2025
People also ask

Which AI is best for developers?
For most developers, GPT-5.1 is the best default because it balances coding accuracy, speed and cost across everyday tasks. Sonnet 4.5 is better for long, high risk workflows like terminal agents and complex repo wide refactors where reliability matters more than price.
🌐
binaryverseai.com
binaryverseai.com › home › ai models & platforms › gpt-5.1 vs sonnet 4.5: a developer’s decision playbook for the ai coding debate
GPT-5.1 Vs Sonnet 4.5: 5 Proven 2025 Wins For Serious Devs
Which is the best AI tool for coding?
The best AI tool for coding in 2025 is usually GPT-5.1 or GPT-5.1 Codex inside your IDE, with Sonnet 4.5 reserved for the hardest tickets. Many teams get the best results by pairing them in a two model workflow, using Sonnet 4.5 to plan and GPT-5.1 to implement and iterate.
🌐
binaryverseai.com
binaryverseai.com › home › ai models & platforms › gpt-5.1 vs sonnet 4.5: a developer’s decision playbook for the ai coding debate
GPT-5.1 Vs Sonnet 4.5: 5 Proven 2025 Wins For Serious Devs
What is the most cost-effective AI for coding?
For most coding workloads, GPT-5.1 is the most cost effective option because it delivers near top tier accuracy at a lower token price than Sonnet 4.5. If your tasks are cheap to retry, GPT-5.1 usually wins on cost per solved ticket, while Sonnet 4.5 can pay off on rare, high stakes problems.
🌐
binaryverseai.com
binaryverseai.com › home › ai models & platforms › gpt-5.1 vs sonnet 4.5: a developer’s decision playbook for the ai coding debate
GPT-5.1 Vs Sonnet 4.5: 5 Proven 2025 Wins For Serious Devs
🌐
CometAPI
cometapi.com › gpt-5-1-vs-claude-sonnet-4-5
GPT-5.1 vs Claude Sonnet 4.5 — Which one leads the frontier in 2025? - CometAPI - All AI Models in One API
1 month ago - OpenAI and early partners report that GPT-5.1 outperforms GPT-5 on a variety of code and reasoning suites, and runs 2–3× faster than GPT-5 in some tool-heavy contexts while using fewer tokens for many tasks.
🌐
Data Studios
datastudios.org › post › claude-opus-4-5-vs-chatgpt-5-1-full-report-and-comparison-of-models-features-performance-pricin
Claude Opus 4.5 vs. ChatGPT 5.1: Full Report and Comparison of Models, Features, Performance, Pricing and more
November 25, 2025 - Claude’s family (specifically the Claude Sonnet 4.5 model, which is a sibling to Opus 4.5) showed a huge leap here, going from ~40% on the older version to over 60% success on OSWorld tasks. Competing models (including GPT-5.1) were still under 40% on these tasks.
🌐
Clarifai
clarifai.com › home › gemini 3.0 vs gpt-5.1 vs claude 4.5 vs grok 4.1: ai model comparison
Gemini 3.0 vs GPT-5.1 vs Claude 4.5 vs Grok 4.1: AI Model Comparison
3 weeks ago - GPT‑5.1 balances cost and capability—its Instant mode creates engaging dialogues and its patching tools ensure safe code modifications, making it a practical choice for many developers.
🌐
Bind AI IDE
blog.getbind.co › 2025 › 11 › 19 › gemini-3-0-vs-gpt-5-1-vs-claude-sonnet-4-5-which-one-is-better
Gemini 3.0 vs GPT-5.1 vs Claude Sonnet 4.5: Which one is better? – Bind AI IDE
November 19, 2025 - Ideal for teams wanting quick ... Sonnet 4.5 (Try here) — Built for longer autonomous runs, deep agentic reliability and safety focus, strong at complex planning and stepwise bugfixing....
🌐
Medium
medium.com › @paulhoke › comparing-ai-models-gpt-5-1-gpt-5-gpt-4-1-claude-sonnet-4-5-and-claude-haiku-4-5-4d5a9e6561da
Comparing AI Models: GPT-5.1, GPT-5, GPT-4.1, Claude Sonnet 4.5, and Claude Haiku 4.5 | by Paul Hoke | Nov, 2025 | Medium
November 14, 2025 - GPT-5.1 excels in conversational applications with superior hallucination reduction and instruction following, representing the latest in OpenAI’s evolution. Claude Sonnet 4.5 dominates complex reasoning, coding, and large-scale document ...
Find elsewhere
🌐
TechRadar
techradar.com › ai platforms & assistants
I tested Gemini 3, ChatGPT 5.1, and Claude Sonnet 4.5 – and Gemini crushed it in a real coding task | TechRadar
November 18, 2025 - Claude, in particular, impressed me with its prompt-driven coding skills, what many are now calling "Vibe Coding," where instead of writing code, you just tell the AI what you want – vibing with the AI results – nudging it along with subsequent ...
🌐
Binary Verse AI
binaryverseai.com › home › ai models & platforms › gpt-5.1 vs sonnet 4.5: a developer’s decision playbook for the ai coding debate
GPT-5.1 Vs Sonnet 4.5: 5 Proven 2025 Wins For Serious Devs
GPT-5.1 is cheaper per token, excellent at everyday coding, and solid on full repo and terminal benchmarks. It is a strong default for most dev teams. Claude Sonnet 4.5 is more expensive but leads on SWE-bench and Terminal-Bench style work.
Published   November 16, 2025
🌐
Glbgpt
glbgpt.com › hub › gpt51-vs-claude-sonnet-45
GPT‑5.1 vs Claude Sonnet 4.5: Deep Test in Writing, Coding, and Automation - The Surprising Winner Revealed
November 14, 2025 - Gemini 2.5 Pro judged GPT‑5.1’s as technical documentation and Claude’s as popular science. Both had merit, but Claude nailed word count and audience targeting. This test genuinely surprised me.
🌐
Cursor IDE
cursor-ide.com › blog › gpt-51-vs-claude-45
GPT-5/5.1 vs Claude Sonnet 4.5: Complete 2025 Comparison Guide - Cursor IDE 博客
November 13, 2025 - The maturity difference manifests primarily in third-party tooling availability. GPT-5 currently integrates with more IDE plugins and workflow automation tools, while Claude Sonnet 4.5 maintains stronger first-party support through Anthropic's developer platform and native implementations on ...
🌐
Hacker News
news.ycombinator.com › item
GPT-5.1 for Developers | Hacker News
November 17, 2025 - Claude 4.5 Sonnet definitely struggles with Swift 6.2 Concurrency semantics and has several times gotten itself stuck rather badly. Additionally Claude Code has developed a number of bugs, including rapidly re-scrolling the terminal buffer, pegging local CPU to 100%, and consuming vast amounts ...
🌐
Composio
composio.dev › blog › kimi-k2-thinking-vs-claude-4-5-sonnet-vs-gpt-5-codex-tested-the-best-models-for-agentic-coding
GPT-5.1 Codex vs. Claude 4.5 Sonnet vs. Kimi K2 Thinking : Tested the best models for agentic coding - Composio
November 13, 2025 - Claude designs better but doesn't integrate. Kimi has clever ideas but introduces showstoppers. For real-world development where you need working code fast, Codex is the practical choice, and GPT-5.1 is the evolution that makes it even better.
🌐
Tom's Guide
tomsguide.com › ai
ChatGPT-5.1 vs Claude 4.5 Sonnet — I ran 9 tests to find the most creative assistant | Tom's Guide
November 13, 2025 - This test proves that even with the updated model, ChatGPT-5.1 does not have the imagination, emotional depth and creative understanding as Claude. So if you're staring down a blank page, that's the partner you want.
🌐
Composio
composio.dev › blog › claude-sonnet-4-5-vs-gpt-5-codex-best-model-for-agentic-coding
Claude Sonnet 4.5 vs. GPT-5 Codex: Best model for agentic coding - Composio
October 7, 2025 - Struggled more with lint fixes and schema edge cases in this project. GPT‑5 Codex + Codex: Strongest at iterative execution, refactoring, and debugging; reliably shipped a working recommendation pipeline with minimal lint errors.
🌐
Reddit
reddit.com › r/openai › gpt-5.1 impressions: better clarity but limited problem-solving gains
r/OpenAI on Reddit: GPT-5.1 impressions: better clarity but limited problem-solving gains
November 13, 2025 -

I've been using GPT-5.1 for a bit and noticed some improvements in how it frames answers. It seems more comfortable explaining things in a way that's easier to understand. Despite that, I still find its ability to express itself falls short compared to models like Claude or Google's Gemini.

When it comes to solving problems, I haven't noticed any real improvement. I tried a few algorithm questions and the issues that GPT-5 couldn't handle remain unresolved in 5.1.

In short, this may be a significant upgrade for some users, but in my area of work it hasn't felt like a major change.

🌐
LLM Stats
llm-stats.com › models › compare › claude-sonnet-4-5-20250929-vs-gpt-5.1-instant-2025-11-12
Claude Sonnet 4.5 vs GPT-5.1 Instant
November 12, 2025 - In-depth Claude Sonnet 4.5 vs GPT-5.1 Instant comparison: Latest benchmarks, pricing, context window, performance metrics, and technical specifications in 2025.