Claude Sonnet 4.5 release tomorrow 9/19
I tested GPT-5.1 Codex against Sonnet 4.5, and it's about time Anthropic bros take pricing seriously.
Claude Sonnet 4.5 is here!
Claude Sonnet 4.5’s Bold Claims Don’t Match What Software Developers Are Seeing
Videos
I've used Claude Sonnets the most among LLMs, for the simple reason that they are so good at prompt-following and an absolute beast at tool execution. That also partly explains the maximum Anthropic revenue from APIs (code agents to be precise). They have an insane first-mover advantage, and developers love to die for.
But GPT 5.1 codex has been insanely good. One of the first things I do when a new promising model drops is to run small tests to decide which models to stick with until the next significant drop. Also, allows dogfooding our product while building these.
I did a quick competition among Claude 4.5 Sonnet, GPT 5, 5.1 Codex, and Kimi k2 thinking.
Test 1 involved building a system that learns baseline error rates, uses z-scores and moving averages, catches rate-of-change spikes, and handles 100k+ logs/minute with under 10ms latency.
Test 2 involved fixing race conditions when multiple processors detect the same anomaly. Handle ≤3s clock skew and processor crashes. Prevent duplicate alerts when processors fire within 5 seconds of each other.
The setup used models with their own CLI agent inside Cursor,
Claude Code with Sonnet 4.5
GPT 5 and 5.1 Codex with Codex CLI
Kimi K2 Thinking with Kimi CLI
Here's what I found out:
Test 1 - Advanced Anomaly Detection: Both GPT-5 and GPT-5.1 Codex shipped working code. Claude and Kimi both had critical bugs that would crash in production. GPT-5.1 improved on GPT-5's architecture and was faster (11m vs 18m).
Test 2 - Distributed Alert Deduplication: Codexes won again with actual integration. Claude had solid architecture, but didn't wire it up. Kimi had good ideas, but a broken duplicate-detection logic.
Codex cost me $0.95 total (GPT-5) vs Claude's $1.68. That's 43% cheaper for code that actually works. GPT-5.1 was even more efficient at $0.76 total ($0.39 for test 1, $0.37 for test 2).
I have written down a complete comparison picture for this. Check it out here: Codexes vs Sonnet vs Kimi
And, honestly, I can see the simillar performance delta in other tasks as well. Though for many quick tasks I still use Haiku, and Opus for hardcore reasoning, but GPT-5 variants have become great workhorses.
OpenAI is certainly after that juicy Anthropic enterprise margins, and Anthropic really needs to rethink its pricing.
Would love to know your experience with GPT 5.1 and how you rate it against Claude 4.5 Sonnet.
https://www.an.com/news/claude-sonnet-4-5
https://www.anthropic.com/news/claude-sonnet-4-5