🌐
Medium
medium.com › @leucopsis › gpt-5-1-codex-max-vs-claude-opus-4-5-ad995359231b
GPT-5.1-Codex-Max vs Claude Opus 4.5 | by Barnacle Goose | Dec, 2025 | Medium
3 weeks ago - The GPT-5.1 family performs exceptionally well here, and Codex-Max benefits from that foundation. Its reported score is about 89.4 percent on the Pro-level variant, significantly ahead of Claude Opus 4.5 at roughly 82.4 percent.
People also ask

Who should pay for Claude Opus 4.5 versus GPT-5.1?
Professionals who depend on high-quality code and can justify the cost should consider Claude Opus 4.5. Serious hobbyists and budget-conscious users often find GPT-5.1 delivers most value at a fraction of the price. Casual users should start with free tiers and upgrade only if usage justifies it.
🌐
humai.blog
humai.blog › gpt-5-1-vs-claude-opus-4-5-the-complete-comparison
GPT-5.1 vs Claude Opus 4.5: The Complete Comparison
Should I wait for GPT-6 or Claude 5 before investing?
No. Waiting for the next model is usually a poor strategy because models continually improve. The opportunity cost of lost productivity today typically outweighs the benefits of marginal future improvements. Only consider waiting if you're planning a major long-term investment tied to a specific API.
🌐
humai.blog
humai.blog › gpt-5-1-vs-claude-opus-4-5-the-complete-comparison
GPT-5.1 vs Claude Opus 4.5: The Complete Comparison
What are the practical limits and caveats when using these models?
Both models are powerful but imperfect: they require human review, testing, and oversight. Expect occasional errors, edge-case failures, and the need for repeated prompting. Don’t assume model output is production-ready without verification.
🌐
humai.blog
humai.blog › gpt-5-1-vs-claude-opus-4-5-the-complete-comparison
GPT-5.1 vs Claude Opus 4.5: The Complete Comparison
🌐
Reddit
reddit.com › r/claudeai › i tested gpt-5.1 codex against sonnet 4.5, and it's about time anthropic bros take pricing seriously.
r/ClaudeAI on Reddit: I tested GPT-5.1 Codex against Sonnet 4.5, and it's about time Anthropic bros take pricing seriously.
November 15, 2025 -

I've used Claude Sonnets the most among LLMs, for the simple reason that they are so good at prompt-following and an absolute beast at tool execution. That also partly explains the maximum Anthropic revenue from APIs (code agents to be precise). They have an insane first-mover advantage, and developers love to die for.

But GPT 5.1 codex has been insanely good. One of the first things I do when a new promising model drops is to run small tests to decide which models to stick with until the next significant drop. Also, allows dogfooding our product while building these.

I did a quick competition among Claude 4.5 Sonnet, GPT 5, 5.1 Codex, and Kimi k2 thinking.

  • Test 1 involved building a system that learns baseline error rates, uses z-scores and moving averages, catches rate-of-change spikes, and handles 100k+ logs/minute with under 10ms latency.

  • Test 2 involved fixing race conditions when multiple processors detect the same anomaly. Handle ≤3s clock skew and processor crashes. Prevent duplicate alerts when processors fire within 5 seconds of each other.

The setup used models with their own CLI agent inside Cursor,

  • Claude Code with Sonnet 4.5

  • GPT 5 and 5.1 Codex with Codex CLI

  • Kimi K2 Thinking with Kimi CLI

Here's what I found out:

  • Test 1 - Advanced Anomaly Detection: Both GPT-5 and GPT-5.1 Codex shipped working code. Claude and Kimi both had critical bugs that would crash in production. GPT-5.1 improved on GPT-5's architecture and was faster (11m vs 18m).

  • Test 2 - Distributed Alert Deduplication: Codexes won again with actual integration. Claude had solid architecture, but didn't wire it up. Kimi had good ideas, but a broken duplicate-detection logic.

Codex cost me $0.95 total (GPT-5) vs Claude's $1.68. That's 43% cheaper for code that actually works. GPT-5.1 was even more efficient at $0.76 total ($0.39 for test 1, $0.37 for test 2).

I have written down a complete comparison picture for this. Check it out here: Codexes vs Sonnet vs Kimi

And, honestly, I can see the simillar performance delta in other tasks as well. Though for many quick tasks I still use Haiku, and Opus for hardcore reasoning, but GPT-5 variants have become great workhorses.

OpenAI is certainly after that juicy Anthropic enterprise margins, and Anthropic really needs to rethink its pricing.

Would love to know your experience with GPT 5.1 and how you rate it against Claude 4.5 Sonnet.

🌐
Humai
humai.blog › gpt-5-1-vs-claude-opus-4-5-the-complete-comparison
GPT-5.1 vs Claude Opus 4.5: The Complete Comparison
1 month ago - The honest verdict: for junior developers or simple projects, Codex-Max is cheaper and faster. For complex codebases or senior developers who want a proper pair programmer, Claude Code justifies its cost.
🌐
Hacker News
news.ycombinator.com › item
GPT-5.2-Codex | Hacker News
5 days ago - Codex is so so good at finding bugs and little inconsistencies, it's astounding to me. Where Claude Code is good at "raw coding", Codex/GPT5.x are unbeatable in terms of careful, methodical finding of "problems" (be it in code, or in math) · Yes, it takes longer (quality, not speed please!)
🌐
Medium
medium.com › genaius › claude-opus-4-5-vs-gpt-5-1-codex-max-best-coding-brain-or-best-coding-factory-68ab74a8cd57
Claude Opus 4.5 vs GPT-5.1-Codex-Max: Best Coding Brain Or Best Coding Factory? | by Namish Saxena | GenAIUs | Nov, 2025 | Medium
November 25, 2025 - OpenAI says GPT-5.1-Codex-Max is its new “frontier agentic coding model”, built for project scale work, trained to operate across multiple context windows using compaction, already the default engine across Codex CLI, IDE extension, cloud ...
🌐
Hansreinl
hansreinl.de › blog › ai-coding-benchmark-gpt-5-1-gemini-3-opus-4-5
Gemini 3 Pro vs GPT-5.1 Codex-Max vs Claude Opus 4.5: AI Coding Benchmark | Blog | Hans Reinl
Best for UI/Design: Claude Opus 4.5. If you need a landing page or a complex UI component, Opus shines, but be ready to refactor the logic and strip out bloat. Best for Flexibility: GPT 5.1 Codex Max.
🌐
Hacker News
news.ycombinator.com › item
Building more with GPT-5.1-Codex-Max | Hacker News
November 23, 2025 - One huge difference I notice between Codex and Claude code is that, while Claude basically disregards your instructions (CLAUDE.md) entirely, Codex is extremely, painfully, doggedly persistent in following every last character of them - to the point that i've seen it work for 30 minutes to ...
Find elsewhere
🌐
Getpassionfruit
getpassionfruit.com › blog › gpt-5-1-vs-claude-4-5-sonnet-vs-gemini-3-pro-vs-deepseek-v3-2-the-definitive-2025-ai-model-comparison
GPT 5.1 vs Claude 4.5 vs Gemini 3: 2025 AI Comparison
The model's OSWorld score jumped ... tasks. GPT 5.1 introduced the Codex-Max variant specifically for "long-running agentic coding tasks," using 30% fewer thinking tokens than standard GPT-5.1-Codex at equivalent quality...
🌐
Composio
composio.dev › blog › claude-4-5-opus-vs-gemini-3-pro-vs-gpt-5-codex-max-the-sota-coding-model
Claude 4.5 Opus vs. Gemini 3 Pro vs. GPT-5.2-codex-max: The SOTA coding model - Composio
Gemini 3 Pro: Best result on Test 1. The fallback and cache were actually working and fast. Test 2 was weird; it kept hitting a loop, which resulted in halting the request. GPT-5.2 Codex: Turned out to be the least reliable for me in these two tasks. Too many API and version mismatches, and it never really landed a clean working implementation. One thing I really hate about Opus in Claude Code ...
🌐
Composio
composio.dev › blog › kimi-k2-thinking-vs-claude-4-5-sonnet-vs-gpt-5-codex-tested-the-best-models-for-agentic-coding
GPT-5.1 Codex vs. Claude 4.5 Sonnet vs. Kimi K2 Thinking : Tested the best models for agentic coding - Composio
GPT-5.1's advisory lock approach is cleaner than GPT-5's reservation table and eliminates the race condition. ... Kimi: ~$0.51 (estimated from aggregate) Codex is cheaper despite using more tokens. Claude's extended thinking and higher output ...
🌐
Medium
medium.com › @leucopsis › how-gpt-5-codex-compares-to-claude-sonnet-4-5-1c1c0c2120b0
How GPT-5-Codex Compares to Claude Sonnet 4.5 | by Barnacle Goose | Medium
November 15, 2025 - In contrast, Codex accomplished the same work using only 250k and 100k tokens, respectively. The evidence suggests that GPT-5-Codex’s “dynamic thinking time” architecture and the “less is more” prompting principle result in dramatically ...
🌐
Composio
composio.dev › blog › claude-sonnet-4-5-vs-gpt-5-codex-best-model-for-agentic-coding
Claude Sonnet 4.5 vs. GPT-5 Codex: Best model for agentic coding - Composio
If you depend on LLMs on a day-to-day basis, I’d pick Codex for the long run if the DX improves. If you care about perfect UI and architectural guidance, bring in Sonnet 4.5 for design and documentation, then let Codex implement and harden it. ... Claude 4.5 Sonnet, Claude 4.5 Sonnet vs. GPT-5
🌐
Sameer Khan
sameerkhan.me › home › blog › gpt-5.1 codex max vs claude opus 4.5 for coding
GPT-5.1 Codex Max vs Claude Opus 4.5 for Coding | Sameer Khan | Sameer Khan
3 weeks ago - ... Cost Advantage: Codex-Max is 6x cheaper than Claude Opus 4.5. ... Verdict: Codex-Max offers better cost-performance for pure coding tasks. ... For pure coding tasks, GPT-5.1-Codex-Max is the better choice.
🌐
Vertu
vertu.com › best post › coding model battle: claude opus 4.5 vs. gemini 3 pro vs. gpt-5.1
Claude Opus 4.5 vs. Gemini 3 Pro vs. GPT-5.1: AI Coding Model Battle & SWE-bench Winner | Anthropic
November 25, 2025 - For developers who spend significant time in terminal environments, Claude Opus 4.5's 59.3% score on Terminal-bench 2.0 leads the pack, though GPT-5.1-Codex-Max comes close at 58.1%. This benchmark tests the ability to solve coding problems ...
🌐
Builder.io
builder.io › blog › codex-vs-claude-code
Codex vs Claude Code: which is the better AI coding agent?
October 1, 2025 - Codex tends to reason a bit longer, but its visible tokens-per-second output feels faster. Claude Code tends to reason less, but its visible output tokens come a bit slower. Inside Cursor, switching models changes the feel along the same lines: ...
🌐
DEV Community
dev.to › blamsa0mine › claude-code-vs-gpt-5-codex-which-one-should-you-use-and-when--4092
Claude Code vs GPT‑5 Codex: which one should you use — and when ? - DEV Community
September 17, 2025 - TL;DR — Use both. Reach for GPT‑5 Codex when you need fast, precise diffs and short‑cycle code‑gen inside your IDE; switch to Claude Code for deep repo understanding, multi‑step refactors, and disciplined terminal workflows.
🌐
Bind AI IDE
blog.getbind.co › 2025 › 09 › 16 › gpt-5-codex-vs-claude-code-vs-cursor-which-is-best-for-coding
GPT-5 Codex vs Claude Code vs Cursor – Which is best for coding?
September 17, 2025 - If your priority is deep, autonomous support for a large, complex codebase (tests, refactoring, reviews), then GPT-5-Codex currently seems to be the most advanced; its improvements in code review, visual inputs, and dependency reasoning give ...
🌐
Reddit
reddit.com › r/claudeai › 24 hours with claude code (opus 4.1) vs codex (gpt-5)
r/ClaudeAI on Reddit: 24 Hours with Claude Code (Opus 4.1) vs Codex (GPT-5)
August 8, 2025 -

Been testing both for a full day now, and I've got some thoughts. Also want to make sure I'm not going crazy.

Look, maybe I'm biased because I'm used to it, but Claude Code just feels right in my terminal. I actually prefer it over the Claude desktop app most of the time bc of the granular control. Want to crank up thinking? Use "ultrathink"? Need agents? Just ask.

Now, GPT-5. Man, I had HIGH hopes. OpenAI's marketing this as the "best coding model" and I was expecting that same mind-blown feeling I got when Claude Code (Opus 4) first dropped. But honestly? Not even close. And yes, before anyone asks, I'm using GPT-5 on Medium as a Plus user, so maybe the heavy thinking version is much different (though I doubt it).

What's really got me scratching my head is seeing the Cursor CEO singing its praises. Like, am I using it wrong? Is GPT-5 somehow way better in Cursor than in Codex CLI? Because with Claude, the experience is much better in Claude code vs cursor imo (why I don't use cursor anymore)

The Torture Test: My go-to new model test is having them build complex 3D renders from scratch. After Opus 4.1 was released, I had Claude Code tackle a biochemical mechanism visualization with multiple organelles, proteins, substrates, the whole nine yards. Claude picked Vite + Three.js + GSAP, and while it didn't one-shot it (they never do), I got damn close to a viable animation in a single day. That's impressive, especially considering the little effort I intentionally put forth.

So naturally, I thought I'd let GPT-5 take a crack at fixing some lingering bugs. Key word: thought.

Not only could it NOT fix them, it actively broke working parts of the code. Features it claimed to implement? Either missing or broken. I specifically prompted Codex to carefully read the files, understand the existing architecture, and exercise caution. The kind of instructions that would have Claude treating my code like fine china. GPT-5? Went full bull in a china shop.

Don't get me wrong, I've seen Claude break things too. But after extensive testing across different scenarios, here's my take:

  • Simple stuff (basic features, bug fixes): GPT-5 holds its own

  • Complex from-scratch projects: Claude by a mile

  • Understanding existing codebases: Claude handles context better (it always been like this)

I'm continuing to test GPT-5 in various scenarios, but right now I can't confidently build anything complex from scratch with it.

Curious what everyone else's experience has been. Am I missing something here, or is the emperor wearing no clothes?

Top answer
1 of 5
114
I don’t think it’ll be that long before OpenAI just go full consumer and stop trying to be the AI company for everyone Already removing the model choice on ChatGPT has broken the app for me. Coding is still better in Claude — despite those models not reaching the same benchmarks. Each major lab is going to have to start specialising
2 of 5
38
Claude Code as a CLI platform is infinitely better than Codex. However using GPT-5 inside of Cursor I’ve found it to be at least as capable as Claude 4.1. However, my criticism is that while it does execute well, the code it writes is very difficult to read whereas Claude seems to provide readable code out of the box without having to be asked. I do find that GPT 5 is better at planning out what it’s going to do before doing it, but you can get Claude to also do this with just having it write a plan.md file and reviewing it with Claude. I’ve also found GPT 5 to forget fewer specific details. Claude 4.1 will often forget to add #includes in my C++ code for example, while GPT 5 seems to do this a bet less. They both tend to screw up though and I usually end up with about 2-3 rounds of compilation failures before I get code that compiles. 90% of the time if it compiles, it also works as intended. Overall, I think it is at least equivalent (as long as you use Claude inside Cursor, NOT Codex). I’ve not thrown problems at it that Claude cannot already solve though, so interested to do some more difficult tests with it.