🌐
Reddit
reddit.com › r/claudeai › i tested gpt-5.1 codex against sonnet 4.5, and it's about time anthropic bros take pricing seriously.
r/ClaudeAI on Reddit: I tested GPT-5.1 Codex against Sonnet 4.5, and it's about time Anthropic bros take pricing seriously.
November 15, 2025 -

I've used Claude Sonnets the most among LLMs, for the simple reason that they are so good at prompt-following and an absolute beast at tool execution. That also partly explains the maximum Anthropic revenue from APIs (code agents to be precise). They have an insane first-mover advantage, and developers love to die for.

But GPT 5.1 codex has been insanely good. One of the first things I do when a new promising model drops is to run small tests to decide which models to stick with until the next significant drop. Also, allows dogfooding our product while building these.

I did a quick competition among Claude 4.5 Sonnet, GPT 5, 5.1 Codex, and Kimi k2 thinking.

  • Test 1 involved building a system that learns baseline error rates, uses z-scores and moving averages, catches rate-of-change spikes, and handles 100k+ logs/minute with under 10ms latency.

  • Test 2 involved fixing race conditions when multiple processors detect the same anomaly. Handle ≤3s clock skew and processor crashes. Prevent duplicate alerts when processors fire within 5 seconds of each other.

The setup used models with their own CLI agent inside Cursor,

  • Claude Code with Sonnet 4.5

  • GPT 5 and 5.1 Codex with Codex CLI

  • Kimi K2 Thinking with Kimi CLI

Here's what I found out:

  • Test 1 - Advanced Anomaly Detection: Both GPT-5 and GPT-5.1 Codex shipped working code. Claude and Kimi both had critical bugs that would crash in production. GPT-5.1 improved on GPT-5's architecture and was faster (11m vs 18m).

  • Test 2 - Distributed Alert Deduplication: Codexes won again with actual integration. Claude had solid architecture, but didn't wire it up. Kimi had good ideas, but a broken duplicate-detection logic.

Codex cost me $0.95 total (GPT-5) vs Claude's $1.68. That's 43% cheaper for code that actually works. GPT-5.1 was even more efficient at $0.76 total ($0.39 for test 1, $0.37 for test 2).

I have written down a complete comparison picture for this. Check it out here: Codexes vs Sonnet vs Kimi

And, honestly, I can see the simillar performance delta in other tasks as well. Though for many quick tasks I still use Haiku, and Opus for hardcore reasoning, but GPT-5 variants have become great workhorses.

OpenAI is certainly after that juicy Anthropic enterprise margins, and Anthropic really needs to rethink its pricing.

Would love to know your experience with GPT 5.1 and how you rate it against Claude 4.5 Sonnet.

🌐
Tom's Guide
tomsguide.com › ai › chatgpt
GPT-5.1 vs Claude 4.5 Sonnet — I tested 7 personality modes on each to see which was more personable | Tom's Guide
November 18, 2025 - Two goliaths of the AI world, ChatGPT and Claude, have each made the argument that they have the more personable chatbot. On both the latest versions, GPT-5.1 and Claude 4.5 Sonnet, you have the ability to customize the chatbot to a personality of your choosing, best fitting how you actually use it.
🌐
CometAPI
cometapi.com › gpt-5-1-vs-claude-sonnet-4-5
GPT-5.1 vs Claude Sonnet 4.5 — Which one leads the frontier in 2025? - CometAPI - All AI Models in One API
December 2, 2025 - OpenAI and early partners report that GPT-5.1 outperforms GPT-5 on a variety of code and reasoning suites, and runs 2–3× faster than GPT-5 in some tool-heavy contexts while using fewer tokens for many tasks.
🌐
Composio
composio.dev › blog › kimi-k2-thinking-vs-claude-4-5-sonnet-vs-gpt-5-codex-tested-the-best-models-for-agentic-coding
GPT-5.1 Codex vs. Claude 4.5 Sonnet vs. Kimi K2 Thinking : Tested the best models for agentic coding - Composio
November 13, 2025 - Claude designs better but doesn't integrate. Kimi has clever ideas but introduces showstoppers. For real-world development where you need working code fast, Codex is the practical choice, and GPT-5.1 is the evolution that makes it even better.
🌐
Tom's Guide
tomsguide.com › ai
ChatGPT-5.1 vs Claude 4.5 Sonnet — I ran 9 tests to find the most creative assistant | Tom's Guide
November 13, 2025 - ChatGPT-5.1 presented a clever, ... memories into portals. Claude 4.5 Sonnet crafted an emotionally resonating scene by establishing immediate mystery with a specific, impossible message from the dead....
🌐
Clarifai
clarifai.com › home › gemini 3.0 vs gpt-5.1 vs claude 4.5 vs grok 4.1: ai model comparison
Gemini 3.0 vs GPT-5.1 vs Claude 4.5 vs Grok 4.1: AI Model Comparison
3 weeks ago - GPT‑5.1 balances cost and capability—its Instant mode creates engaging dialogues and its patching tools ensure safe code modifications, making it a practical choice for many developers.
🌐
Reddit
reddit.com › r/openai › gpt-5.1 impressions: better clarity but limited problem-solving gains
r/OpenAI on Reddit: GPT-5.1 impressions: better clarity but limited problem-solving gains
November 13, 2025 -

I've been using GPT-5.1 for a bit and noticed some improvements in how it frames answers. It seems more comfortable explaining things in a way that's easier to understand. Despite that, I still find its ability to express itself falls short compared to models like Claude or Google's Gemini.

When it comes to solving problems, I haven't noticed any real improvement. I tried a few algorithm questions and the issues that GPT-5 couldn't handle remain unresolved in 5.1.

In short, this may be a significant upgrade for some users, but in my area of work it hasn't felt like a major change.

🌐
Medium
medium.com › @paulhoke › comparing-ai-models-gpt-5-1-gpt-5-gpt-4-1-claude-sonnet-4-5-and-claude-haiku-4-5-4d5a9e6561da
Comparing AI Models: GPT-5.1, GPT-5, GPT-4.1, Claude Sonnet 4.5, and Claude Haiku 4.5 | by Paul Hoke | Nov, 2025 | Medium
November 14, 2025 - GPT-5.1 excels in conversational applications with superior hallucination reduction and instruction following, representing the latest in OpenAI’s evolution. Claude Sonnet 4.5 dominates complex reasoning, coding, and large-scale document ...
🌐
Glbgpt
glbgpt.com › hub › gpt51-vs-claude-sonnet-45
GPT‑5.1 vs Claude Sonnet 4.5: Deep Test in Writing, Coding, and Automation - The Surprising Winner Revealed
November 14, 2025 - Gemini 2.5 Pro judged GPT‑5.1’s as technical documentation and Claude’s as popular science. Both had merit, but Claude nailed word count and audience targeting. This test genuinely surprised me.
Find elsewhere
🌐
TechRadar
techradar.com › ai platforms & assistants
I tested Gemini 3, ChatGPT 5.1, and Claude Sonnet 4.5 – and Gemini crushed it in a real coding task | TechRadar
November 18, 2025 - Claude, in particular, impressed me with its prompt-driven coding skills, what many are now calling "Vibe Coding," where instead of writing code, you just tell the AI what you want – vibing with the AI results – nudging it along with subsequent ...
🌐
Binary Verse AI
binaryverseai.com › home › ai models & platforms › gpt-5.1 vs sonnet 4.5: a developer’s decision playbook for the ai coding debate
GPT-5.1 Vs Sonnet 4.5: 5 Proven 2025 Wins For Serious Devs
GPT-5.1 is cheaper per token, excellent at everyday coding, and solid on full repo and terminal benchmarks. It is a strong default for most dev teams. Claude Sonnet 4.5 is more expensive but leads on SWE-bench and Terminal-Bench style work.
Published   November 16, 2025
🌐
Data Studios
datastudios.org › post › claude-opus-4-5-vs-chatgpt-5-1-full-report-and-comparison-of-models-features-performance-pricin
Claude Opus 4.5 vs. ChatGPT 5.1: Full Report and Comparison of Models, Features, Performance, Pricing and more
November 25, 2025 - Claude’s family (specifically the Claude Sonnet 4.5 model, which is a sibling to Opus 4.5) showed a huge leap here, going from ~40% on the older version to over 60% success on OSWorld tasks. Competing models (including GPT-5.1) were still under 40% on these tasks.
🌐
Cursor IDE
cursor-ide.com › blog › gpt-51-vs-claude-45
GPT-5/5.1 vs Claude Sonnet 4.5: Complete 2025 Comparison Guide - Cursor IDE 博客
November 13, 2025 - The maturity difference manifests primarily in third-party tooling availability. GPT-5 currently integrates with more IDE plugins and workflow automation tools, while Claude Sonnet 4.5 maintains stronger first-party support through Anthropic's developer platform and native implementations on ...
🌐
Hacker News
news.ycombinator.com › item
GPT-5.1 for Developers | Hacker News
November 17, 2025 - Claude 4.5 Sonnet definitely struggles with Swift 6.2 Concurrency semantics and has several times gotten itself stuck rather badly. Additionally Claude Code has developed a number of bugs, including rapidly re-scrolling the terminal buffer, pegging local CPU to 100%, and consuming vast amounts ...
🌐
Bind AI IDE
blog.getbind.co › 2025 › 11 › 19 › gemini-3-0-vs-gpt-5-1-vs-claude-sonnet-4-5-which-one-is-better
Gemini 3.0 vs GPT-5.1 vs Claude Sonnet 4.5: Which one is better? – Bind AI IDE
November 19, 2025 - Ideal for teams wanting quick ... Sonnet 4.5 (Try here) — Built for longer autonomous runs, deep agentic reliability and safety focus, strong at complex planning and stepwise bugfixing....
🌐
Composio
composio.dev › blog › claude-sonnet-4-5-vs-gpt-5-codex-best-model-for-agentic-coding
Claude Sonnet 4.5 vs. GPT-5 Codex: Best model for agentic coding - Composio
October 7, 2025 - Struggled more with lint fixes and schema edge cases in this project. GPT‑5 Codex + Codex: Strongest at iterative execution, refactoring, and debugging; reliably shipped a working recommendation pipeline with minimal lint errors.
🌐
Getpassionfruit
getpassionfruit.com › blog › gpt-5-1-vs-claude-4-5-sonnet-vs-gemini-3-pro-vs-deepseek-v3-2-the-definitive-2025-ai-model-comparison
GPT 5.1 vs Claude 4.5 vs Gemini 3: 2025 AI Comparison
1 month ago - Gemini 3 Pro leads overall reasoning benchmarks with an unprecedented 1501 LMArena Elo, becoming the first model to break the 1500 barrier, while Claude 4.5 Sonnet dominates real-world coding at 77.2% SWE-bench and DeepSeek-V3.2 delivers ...
🌐
Medium
medium.com › @kram254 › gpt-5-1-variants-vs-claude-sonnet-4-5-ce7a2268a9fc
GPT-5.1 Variants vs. Claude Sonnet 4.5 | by Emmanuel Mark Ndaliro | Nov, 2025 | Medium
November 15, 2025 - Claude Sonnet 4.5 is all about structured reasoning and language clarity. Its architecture elevates it for tasks that need precise language understanding. Key Stats: — Reasoning Skills: Excels in nuanced language tasks.
🌐
LLM Stats
llm-stats.com › models › compare › claude-sonnet-4-5-20250929-vs-gpt-5.1-instant-2025-11-12
Claude Sonnet 4.5 vs GPT-5.1 Instant
November 12, 2025 - In-depth Claude Sonnet 4.5 vs GPT-5.1 Instant comparison: Latest benchmarks, pricing, context window, performance metrics, and technical specifications in 2025.