I've used Claude Sonnets the most among LLMs, for the simple reason that they are so good at prompt-following and an absolute beast at tool execution. That also partly explains the maximum Anthropic revenue from APIs (code agents to be precise). They have an insane first-mover advantage, and developers love to die for.
But GPT 5.1 codex has been insanely good. One of the first things I do when a new promising model drops is to run small tests to decide which models to stick with until the next significant drop. Also, allows dogfooding our product while building these.
I did a quick competition among Claude 4.5 Sonnet, GPT 5, 5.1 Codex, and Kimi k2 thinking.
Test 1 involved building a system that learns baseline error rates, uses z-scores and moving averages, catches rate-of-change spikes, and handles 100k+ logs/minute with under 10ms latency.
Test 2 involved fixing race conditions when multiple processors detect the same anomaly. Handle ≤3s clock skew and processor crashes. Prevent duplicate alerts when processors fire within 5 seconds of each other.
The setup used models with their own CLI agent inside Cursor,
Claude Code with Sonnet 4.5
GPT 5 and 5.1 Codex with Codex CLI
Kimi K2 Thinking with Kimi CLI
Here's what I found out:
Test 1 - Advanced Anomaly Detection: Both GPT-5 and GPT-5.1 Codex shipped working code. Claude and Kimi both had critical bugs that would crash in production. GPT-5.1 improved on GPT-5's architecture and was faster (11m vs 18m).
Test 2 - Distributed Alert Deduplication: Codexes won again with actual integration. Claude had solid architecture, but didn't wire it up. Kimi had good ideas, but a broken duplicate-detection logic.
Codex cost me $0.95 total (GPT-5) vs Claude's $1.68. That's 43% cheaper for code that actually works. GPT-5.1 was even more efficient at $0.76 total ($0.39 for test 1, $0.37 for test 2).
I have written down a complete comparison picture for this. Check it out here: Codexes vs Sonnet vs Kimi
And, honestly, I can see the simillar performance delta in other tasks as well. Though for many quick tasks I still use Haiku, and Opus for hardcore reasoning, but GPT-5 variants have become great workhorses.
OpenAI is certainly after that juicy Anthropic enterprise margins, and Anthropic really needs to rethink its pricing.
Would love to know your experience with GPT 5.1 and how you rate it against Claude 4.5 Sonnet.
GPT-5.1 for Developers
GPT-5.1 impressions: better clarity but limited problem-solving gains
GPT-5 vs Sonnet 4.5 Reviews
Claude Sonnet 4.5 vs GPT 5.1
Which AI is best for developers?
Which is the best AI tool for coding?
What is the most cost-effective AI for coding?
Videos
I've been using GPT-5.1 for a bit and noticed some improvements in how it frames answers. It seems more comfortable explaining things in a way that's easier to understand. Despite that, I still find its ability to express itself falls short compared to models like Claude or Google's Gemini.
When it comes to solving problems, I haven't noticed any real improvement. I tried a few algorithm questions and the issues that GPT-5 couldn't handle remain unresolved in 5.1.
In short, this may be a significant upgrade for some users, but in my area of work it hasn't felt like a major change.
To use this sub also a bit in a constructive way... did you test the new Sonnet 4.5 already? How is it performing versus GPT-5 so far?
I am using GPT-5 the last 3 weeks and it is slow but much more precise than Sonnet. Anyone switched back to Sonnet 4.5? Let me know your review and how it performed for you.