gpt 5.1 codex vs claude code

reddit.com › r/claudeai › i tested gpt-5.1 codex against sonnet 4.5, and it's about time anthropic bros take pricing seriously.

r/ClaudeAI on Reddit: I tested GPT-5.1 Codex against Sonnet 4.5, and it's about time Anthropic bros take pricing seriously.

November 15, 2025 -

I've used Claude Sonnets the most among LLMs, for the simple reason that they are so good at prompt-following and an absolute beast at tool execution. That also partly explains the maximum Anthropic revenue from APIs (code agents to be precise). They have an insane first-mover advantage, and developers love to die for.

But GPT 5.1 codex has been insanely good. One of the first things I do when a new promising model drops is to run small tests to decide which models to stick with until the next significant drop. Also, allows dogfooding our product while building these.

I did a quick competition among Claude 4.5 Sonnet, GPT 5, 5.1 Codex, and Kimi k2 thinking.

Test 1 involved building a system that learns baseline error rates, uses z-scores and moving averages, catches rate-of-change spikes, and handles 100k+ logs/minute with under 10ms latency.
Test 2 involved fixing race conditions when multiple processors detect the same anomaly. Handle ≤3s clock skew and processor crashes. Prevent duplicate alerts when processors fire within 5 seconds of each other.

The setup used models with their own CLI agent inside Cursor,

Claude Code with Sonnet 4.5
GPT 5 and 5.1 Codex with Codex CLI
Kimi K2 Thinking with Kimi CLI

Here's what I found out:

Test 1 - Advanced Anomaly Detection: Both GPT-5 and GPT-5.1 Codex shipped working code. Claude and Kimi both had critical bugs that would crash in production. GPT-5.1 improved on GPT-5's architecture and was faster (11m vs 18m).
Test 2 - Distributed Alert Deduplication: Codexes won again with actual integration. Claude had solid architecture, but didn't wire it up. Kimi had good ideas, but a broken duplicate-detection logic.

Codex cost me $0.95 total (GPT-5) vs Claude's $1.68. That's 43% cheaper for code that actually works. GPT-5.1 was even more efficient at $0.76 total ($0.39 for test 1, $0.37 for test 2).

I have written down a complete comparison picture for this. Check it out here: Codexes vs Sonnet vs Kimi

And, honestly, I can see the simillar performance delta in other tasks as well. Though for many quick tasks I still use Haiku, and Opus for hardcore reasoning, but GPT-5 variants have become great workhorses.

OpenAI is certainly after that juicy Anthropic enterprise margins, and Anthropic really needs to rethink its pricing.

Would love to know your experience with GPT 5.1 and how you rate it against Claude 4.5 Sonnet.

Top answer

1 of 5

271

I use codex to audit everything that CC produces.. it’s been quite effective

2 of 5

I've been using Codex when I exhaust my Claude weekly limit, and vice-versa. So far so good for $40/mo. I had Gemini Pro too before, but it destroys my code, and with confidence lol, so I fired him form our team.

Composio

composio.dev › blog › claude-sonnet-4-5-vs-gpt-5-codex-best-model-for-agentic-coding

Claude Sonnet 4.5 vs. GPT-5 Codex: Best model for agentic coding - Composio

Struggled more with lint fixes and schema edge cases in this project. GPT‑5 Codex + Codex: Strongest at iterative execution, refactoring, and debugging; reliably shipped a working recommendation pipeline with minimal lint errors.

Videos