🌐
Reddit
reddit.com › r/geminiai › comparing claude opus 4.5 vs gpt-5.1 vs gemini 3 - coding task
r/GeminiAI on Reddit: Comparing Claude Opus 4.5 vs GPT-5.1 vs Gemini 3 - Coding Task
1 month ago -

I Ran all three models for a coding task just to see how they behave when things aren’t clean or nicely phrased.

The goal was just to see who performs like a real dev.

here's my takeaway

Opus 4.5 handled real repo-issues the best. It fixed things without breaking unrelated parts and didn’t hallucinate new abstractions. Felt the most “engineering-minded

GPT-5.1 was close behind. It explained its reasoning step-by-step and sometimes added improvements I never asked for. Helpful when you want safety, annoying when you want precision

Gemini solved most tasks but tended to optimize or simplify decisions I explicitly constrained. Good output, but sometimes too “creative.”

On Refactoring and architecture-level tasks:
Opus delivered the most complete refactor with consistent naming, updated dependencies, and documentation.
GPT-5.1 took longer because it analyzed first, but the output was maintainable and defensive.
Gemini produced clean code but missed deeper security and design patterns.

Context windows (because it matters at repo scale):

  • Opus 4.5: ~200K tokens usable, handles large repos better without losing track

  • GPT-5.1: ~128K tokens but strong long-reasoning even near the limit

  • Gemini 3 Pro: ~1M tokens which is huge, but performance becomes inconsistent as input gets massive

What's your experience been with these three? Used these frontier models Side by Side in my Multi Agent AI setup with Anannas LLM Provider & the results were interesting.

Have you run your own comparisons, and if so, what setup are you using?

🌐
Vertu
vertu.com › best post › gpt-5.2 codex vs gemini 3 pro vs claude opus 4.5: coding comparison guide
GPT-5.2 Codex vs Gemini 3 Pro vs Claude Opus 4.5
3 days ago - Gemini 3 Pro emerged as the surprise leader for frontend development, combining superior visual quality with the lowest costs. GPT-5.2 Codex proved itself as the most reliable all-rounder, delivering consistent results across diverse coding challenges. Claude Opus 4.5's poor performance in ...
Discussions

[DISCUSSION] Is Gemini 3.0 really better than Claude Sonnet 4.5/Composer for coding?
UI wise it's better than anything else out there by miles based on my testing. There's no competition when it comes to frontend. The benchmarks show Claude is a bit better on swe bench so that means their are some cases where Claude is the better candidate for your code. More on reddit.com
🌐 r/cursor
113
152
November 18, 2025
Opus 4.5 benchmark results
Gemini 3 looks even more impressive considering the price. Hope Anthropic gets pressured and lowers the cost. More on reddit.com
🌐 r/singularity
289
1246
November 24, 2025
Comparing GPT-5.1 vs Gemini 3.0 vs Opus 4.5 across 3 coding tasks. Here's an overview
Opus is one shot master in my case (node.js + react + typescript). I had issues with sonnet more but with opus, it understands me better and suggest better solutions mostly. Using it for focused and smaller tasks for now and they are not too large implementations but still, opus solved few bugs that gemini or gpt can't. Btw I'm not saying other models are bad, they are great indeed! Opus just works better in my case, for now atleast. More on reddit.com
🌐 r/ClaudeAI
75
368
November 26, 2025
Gemini 3 Pro Vision benchmarks: Finally compares against Claude Opus 4.5 and GPT-5.1
Gemini is def the best all rounder model. I think in the long run that's what makes it really "intelligent". Even if it lags behind in coding More on reddit.com
🌐 r/singularity
43
379
3 weeks ago
🌐
Medium
medium.com › ai-software-engineer › i-tested-claude-opus-4-5-vs-gemini-3-pro-close-but-a-clear-winner-surprised-me-1cb6e2cd601d
I Tested Claude Opus 4.5 vs Gemini 3 Pro ( Close But a Clear Winner) | by Joe Njenga | AI Software Engineer | Dec, 2025 | Medium
3 weeks ago - I Tested Claude Opus 4.5 vs Gemini 3 Pro ( Close But a Clear Winner) Just when we thought Gemini 3 Pro had become the coding king, Claude Opus 4.5 dropped and dethroned it. I was quick to test both …
🌐
Substack
natesnewsletter.substack.com › p › claude-opus-45-loves-messy-real-world
I Tested Opus 4.5 Early—Here's Where It Can Save You HOURS on Complex Workflows + a Comparison vs. Gemini 3 and ChatGPT 5.1 + a Model-Picker Prompt + 15 Workflows to Get Started Now
November 25, 2025 - I tested Opus 4.5 vs. Gemini 3 vs. ChatGPT 5.1 on real-world business tasks: here's what I found, plus a complete breakdown of which model I'd use for complex workflows plus a custom model-picker!
🌐
Substack
lennysnewsletter.com › p › which-ai-model-is-the-best-designer
Gemini 3 vs. Claude Opus 4.5 vs. GPT-5.1 Codex: Which AI model is the best designer?
4 weeks ago - 🎙️ Testing Gemini 3, Opus 4.5, and GPT-5.1 Codex on the same redesign task to see which AI model is the best designer. The winner is clear. ... I put three cutting-edge AI models to the test in a head-to-head design competition.
🌐
R&D World
rdworldonline.com › home › rd world posts › how gpt-5.2 stacks up against gemini 3.0 and claude opus 4.5
How GPT-5.2 stacks up against Gemini 3.0 and Claude Opus 4.5
2 weeks ago - The most striking claim is GPT-5.2’s performance on ARC-AGI-2, a benchmark designed to test genuine reasoning ability while resisting memorization. At 52.9% (Thinking) and 54.2% (Pro), OpenAI’s new model significantly outranks both Claude ...
🌐
Composio
composio.dev › blog › claude-4-5-opus-vs-gemini-3-pro-vs-gpt-5-codex-max-the-sota-coding-model
Claude 4.5 Opus vs. Gemini 3 Pro vs. GPT-5-codex-max: The SOTA coding model - Composio
1 month ago - WebDev Arena: Gemini 3 Pro reaches ... Opus 4.5(Claude): Outstanding at strategy and design, but its solutions tend to be elaborate, slower to integrate, and prone to practical hiccups once they hit the metal....
🌐
AceCloud
acecloud.ai › blog › claude-opus-4-5-vs-gemini-3-pro-vs-sonnet-4-5
Claude Opus 4.5 Vs Gemini 3 Pro Vs Sonnet 4.5 Comparison Guide
November 25, 2025 - Pick Gemini 3 Pro if you need very strong multimodal performance, a 1M-token context window by default, and tight integration with Google tools and Search. Pick Claude Opus 4.5 if you care most about frontier coding performance, deep reasoning ...
Find elsewhere
🌐
Simon Willison
simonwillison.net › 2025 › Nov › 24 › claude-opus
Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult
November 24, 2025 - Here’s Opus 4.5 (on its default “high” effort level): It did significantly better on the new more detailed prompt: Here’s that same complex prompt against Gemini 3 Pro and against GPT-5.1-Codex-Max-xhigh.
🌐
Anthropic
anthropic.com › news › claude-opus-4-5
Introducing Claude Opus 4.5
November 24, 2025 - This change improved Gemini 3 to 56.7% and GPT-5.1 to 48.6% from the values reported by their developers, using the Terminus-2 harness. 3: Note that these evaluations were run on an in-progress upgrade to Petri, our open-source, automated evaluation tool. They were run on an earlier snapshot ...
🌐
CometAPI
cometapi.com › gemini-3-pro-vs-claude-4-5-sonnet-for-coding
Gemini 3 Pro vs Claude 4.5 Sonnet for Coding: Which is Better in 2025 - CometAPI - All AI Models in One API
3 weeks ago - Gemini 3 Pro (Google/DeepMind) and Claude Opus 4.5 (Anthropic) are both 2025 frontier models focused on deep reasoning, agentic workflows, and stronger
🌐
Macaron
macaron.im › blog › claude-opus-4-5-vs-chatgpt-5-1-vs-gemini-3-pro
Full Technical Comparison: Claude Opus 4.5 vs. ChatGPT 5.1 vs. Google Gemini 3 Pro - Macaron
November 24, 2025 - Overall, on standard benchmarks like MMLU and PiQA all three are tightly clustered at ~90% accuracy[5], but for “frontier” reasoning tests (complex math, logic puzzles), Gemini 3 Pro has an edge with its “PhD-level” performance[10]. Code ...
🌐
Yahoo! Finance
finance.yahoo.com › news › anthropic-launches-claude-opus-45-as-googles-gemini-3-gains-big-backers-191645109.html
Anthropic launches Claude Opus 4.5 as Google's Gemini 3 gains big backers
November 24, 2025 - According to the company, Opus 4.5 beats out both Gemini 3 Pro and OpenAI's (OPAI.PVT) GPT-5.1-Codex-Max and GPT-5.1 in software engineering. Anthropic also said the model is capable of coming up with creative ways of solving problems.
🌐
Data Studios
datastudios.org › post › google-gemini-3-vs-claude-sonnet-4-5-full-report-and-comparison-of-features-capabilities-pricing
Google Gemini 3 vs. Claude Sonnet 4.5: Full Report and Comparison of Features, Capabilities, Pricing, and more
November 22, 2025 - It effectively can be left “on task” and keep making progress, only stopping when it’s truly done or if it hits a roadblock it can’t resolve on its own. This is a major improvement from earlier models (Claude’s previous version, Opus 4, could only manage ~7 hours autonomously before wandering off track or exhausting context).
🌐
The New Stack
thenewstack.io › home › anthropic’s new claude opus 4.5 reclaims the coding crown
Anthropic's New Claude Opus 4.5 Reclaims the Coding Crown - The New Stack
1 week ago - Anthropic today launched the latest version of its flagship Opus model: Opus 4.5. The company calls it its most intelligent model yet and notes that it is especially strong in solving coding tasks, taking the crown from OpenAI’s GPT-5.1-Codex-Max ...
🌐
Glbgpt
glbgpt.com › hub › gemini-3-pro-vs-claude45
Gemini 3 Pro vs Claude 4.5: I Tested Both for Coding – Here’s the Surprising Winner
November 20, 2025 - If you just want the short answer: for most real-world coding work today, Claude 4.5 is still the more reliable all‑around coding assistant, especially for complex reasoning, planning, and backend logic.
🌐
Reddit
reddit.com › r/cursor › [discussion] is gemini 3.0 really better than claude sonnet 4.5/composer for coding?
r/cursor on Reddit: [DISCUSSION] Is Gemini 3.0 really better than Claude Sonnet 4.5/Composer for coding?
November 18, 2025 -

I've been switching back and forth between Claude Sonnet 4.5 or Composer 1 and Gemini 3.0 and I’m trying to figure out which model actually performs better for real-world coding tasks inside Cursor AI. I'm not looking for a general comparison.

I want feedback specifically in the context of how these models behave inside the Cursor IDE.

🌐
Reddit
reddit.com › r/singularity › opus 4.5 benchmark results
r/singularity on Reddit: Opus 4.5 benchmark results
November 24, 2025 - What that means for projects depends on whether they're bumping against the limits of what AI can do; the increase in ability might represent opening doors that weren't possible to effectively do. If Gemini 3 manages, then it starts looking worse for reasons to choose Opus at the lower context size. That said, I've found in my work that Claude models are much better certain subtypes of long running tasks in ways the benchmarks don't show, particularly when it requires handling high ambiguity and autonomously seeking more information when avaliable data doesn't justify enough confidence.
🌐
Reddit
reddit.com › r/claudeai › comparing gpt-5.1 vs gemini 3.0 vs opus 4.5 across 3 coding tasks. here's an overview
r/ClaudeAI on Reddit: Comparing GPT-5.1 vs Gemini 3.0 vs Opus 4.5 across 3 coding tasks. Here's an overview
November 26, 2025 -

Ran these three models through three real-world coding scenarios to see how they actually perform.

The tests:

Prompt adherence: Asked for a Python rate limiter with 10 specific requirements (exact class names, error messages, etc). Basically, testing if they follow instructions or treat them as "suggestions."

Code refactoring: Gave them a messy, legacy API with security holes and bad practices. Wanted to see if they'd catch the issues and fix the architecture, plus whether they'd add safeguards we didn't explicitly ask for.

System extension: Handed over a partial notification system and asked them to explain the architecture first, then add an email handler. Testing comprehension before implementation.

Results:

Test 1 (Prompt Adherence): Gemini followed instructions most literally. Opus stayed close to spec with cleaner docs. GPT-5.1 went defensive mode - added validation and safeguards that weren't requested.

Test 1 results

Test 2 (TypeScript API): Opus delivered the most complete refactoring (all 10 requirements). GPT-5.1 hit 9/10, caught security issues like missing auth and unsafe DB ops. Gemini got 8/10 with cleaner, faster output but missed some architectural flaws.

Test 2 results

Test 3 (System Extension): Opus gave the most complete solution with templates for every event type. GPT-5.1 went deep on the understanding phase (identified bugs, created diagrams) then built out rich features like CC/BCC and attachments. Gemini understood the basics but delivered a "bare minimum" version.

Test 3 results

Takeaways:

Opus was fastest overall (7 min total) while producing the most thorough output. Stayed concise when the spec was rigid, wrote more when thoroughness mattered.

GPT-5.1 consistently wrote 1.5-1.8x more code than Gemini because of JSDoc comments, validation logic, error handling, and explicit type definitions.

Gemini is cheapest overall but actually cost more than GPT in the complex system task - seems like it "thinks" longer even when the output is shorter.

Opus is most expensive ($1.68 vs $1.10 for Gemini) but if you need complete implementations on the first try, that might be worth it.

Full methodology and detailed breakdown here: https://blog.kilo.ai/p/benchmarking-gpt-51-vs-gemini-30-vs-opus-45

What's your experience been with these three? Have you run your own comparisons, and if so, what setup are you using?