claude opus 4.5 vs gemini 3 pro

reddit.com › r/geminiai › comparing claude opus 4.5 vs gpt-5.1 vs gemini 3 - coding task

r/GeminiAI on Reddit: Comparing Claude Opus 4.5 vs GPT-5.1 vs Gemini 3 - Coding Task

1 month ago -

I Ran all three models for a coding task just to see how they behave when things aren’t clean or nicely phrased.

The goal was just to see who performs like a real dev.

here's my takeaway

Opus 4.5 handled real repo-issues the best. It fixed things without breaking unrelated parts and didn’t hallucinate new abstractions. Felt the most “engineering-minded

GPT-5.1 was close behind. It explained its reasoning step-by-step and sometimes added improvements I never asked for. Helpful when you want safety, annoying when you want precision

Gemini solved most tasks but tended to optimize or simplify decisions I explicitly constrained. Good output, but sometimes too “creative.”

On Refactoring and architecture-level tasks:
Opus delivered the most complete refactor with consistent naming, updated dependencies, and documentation.
GPT-5.1 took longer because it analyzed first, but the output was maintainable and defensive.
Gemini produced clean code but missed deeper security and design patterns.

Context windows (because it matters at repo scale):

Opus 4.5: ~200K tokens usable, handles large repos better without losing track
GPT-5.1: ~128K tokens but strong long-reasoning even near the limit
Gemini 3 Pro: ~1M tokens which is huge, but performance becomes inconsistent as input gets massive

What's your experience been with these three? Used these frontier models Side by Side in my Multi Agent AI setup with Anannas LLM Provider & the results were interesting.

Have you run your own comparisons, and if so, what setup are you using?

Videos

13:01

YouTube

Opus 4.5 Just Destroyed Gemini 3 Pro... - YouTube

November 25, 2025

59:43

YouTube

Reviewing Claude Opus 4.5 - YouTube

November 26, 2025

YouTube

I Tested Gemini 3 Pro vs Claude Opus 4.5 for UI Design (one is ...

2 weeks ago

35:22

YouTube

Claude Opus 4.5 vs. Google Gemini 3: Design & build an app - YouTube

1 month ago

15:38

YouTube

Gemini 3.0 Pro + Claude Opus 4.5 = The Ultimate AI Coding Workflow!

4 weeks ago

23:19

YouTube

Gemini 3 vs Opus 4.5 for Coding in Cursor - It’s Not Even Close ...

1 month ago

View all

Medium

medium.com › ai-software-engineer › i-tested-claude-opus-4-5-vs-gemini-3-pro-close-but-a-clear-winner-surprised-me-1cb6e2cd601d

I Tested Claude Opus 4.5 vs Gemini 3 Pro ( Close But a Clear Winner) | by Joe Njenga | AI Software Engineer | Dec, 2025 | Medium

3 weeks ago - I Tested Claude Opus 4.5 vs Gemini 3 Pro ( Close But a Clear Winner) Just when we thought Gemini 3 Pro had become the coding king, Claude Opus 4.5 dropped and dethroned it. I was quick to test both …

Substack

natesnewsletter.substack.com › p › claude-opus-45-loves-messy-real-world

I Tested Opus 4.5 Early—Here's Where It Can Save You HOURS on Complex Workflows + a Comparison vs. Gemini 3 and ChatGPT 5.1 + a Model-Picker Prompt + 15 Workflows to Get Started Now

November 25, 2025 - I tested Opus 4.5 vs. Gemini 3 vs. ChatGPT 5.1 on real-world business tasks: here's what I found, plus a complete breakdown of which model I'd use for complex workflows plus a custom model-picker!

Substack

lennysnewsletter.com › p › which-ai-model-is-the-best-designer

Gemini 3 vs. Claude Opus 4.5 vs. GPT-5.1 Codex: Which AI model is the best designer?

4 weeks ago - 🎙️ Testing Gemini 3, Opus 4.5, and GPT-5.1 Codex on the same redesign task to see which AI model is the best designer. The winner is clear. ... I put three cutting-edge AI models to the test in a head-to-head design competition.

R&D World

rdworldonline.com › home › rd world posts › how gpt-5.2 stacks up against gemini 3.0 and claude opus 4.5

How GPT-5.2 stacks up against Gemini 3.0 and Claude Opus 4.5

2 weeks ago - The most striking claim is GPT-5.2’s performance on ARC-AGI-2, a benchmark designed to test genuine reasoning ability while resisting memorization. At 52.9% (Thinking) and 54.2% (Pro), OpenAI’s new model significantly outranks both Claude ...

Composio

composio.dev › blog › claude-4-5-opus-vs-gemini-3-pro-vs-gpt-5-codex-max-the-sota-coding-model

Claude 4.5 Opus vs. Gemini 3 Pro vs. GPT-5-codex-max: The SOTA coding model - Composio

1 month ago - WebDev Arena: Gemini 3 Pro reaches ... Opus 4.5(Claude): Outstanding at strategy and design, but its solutions tend to be elaborate, slower to integrate, and prone to practical hiccups once they hit the metal....

AceCloud

acecloud.ai › blog › claude-opus-4-5-vs-gemini-3-pro-vs-sonnet-4-5

Claude Opus 4.5 Vs Gemini 3 Pro Vs Sonnet 4.5 Comparison Guide

November 25, 2025 - Pick Gemini 3 Pro if you need very strong multimodal performance, a 1M-token context window by default, and tight integration with Google tools and Search. Pick Claude Opus 4.5 if you care most about frontier coding performance, deep reasoning ...

Find elsewhere

Google Bing Mojeek

Medium

medium.com › ai-software-engineer › claude-opus-4-5-is-here-and-beats-gemini-3-pro-swe-by-4-7-i-tested-it-e3887df3ed04

Claude Opus 4.5 Is Here (And Beats Gemini 3 Pro SWE) — I Tested It | by Joe Njenga | AI Software Engineer | Nov, 2025 | Medium

November 26, 2025 - Anthropic just released the Claude Opus 4.5 model, and it built this app in 2 minutes and just made Gemini 3 Pro look weaker.

Simon Willison

simonwillison.net › 2025 › Nov › 24 › claude-opus

Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult

November 24, 2025 - Here’s Opus 4.5 (on its default “high” effort level): It did significantly better on the new more detailed prompt: Here’s that same complex prompt against Gemini 3 Pro and against GPT-5.1-Codex-Max-xhigh.

Anthropic

anthropic.com › news › claude-opus-4-5

Introducing Claude Opus 4.5

November 24, 2025 - This change improved Gemini 3 to 56.7% and GPT-5.1 to 48.6% from the values reported by their developers, using the Terminus-2 harness. 3: Note that these evaluations were run on an in-progress upgrade to Petri, our open-source, automated evaluation tool. They were run on an earlier snapshot ...

CometAPI

cometapi.com › gemini-3-pro-vs-claude-4-5-sonnet-for-coding

Gemini 3 Pro vs Claude 4.5 Sonnet for Coding: Which is Better in 2025 - CometAPI - All AI Models in One API

3 weeks ago - Gemini 3 Pro (Google/DeepMind) and Claude Opus 4.5 (Anthropic) are both 2025 frontier models focused on deep reasoning, agentic workflows, and stronger

Macaron

macaron.im › blog › claude-opus-4-5-vs-chatgpt-5-1-vs-gemini-3-pro

Full Technical Comparison: Claude Opus 4.5 vs. ChatGPT 5.1 vs. Google Gemini 3 Pro - Macaron

November 24, 2025 - Overall, on standard benchmarks like MMLU and PiQA all three are tightly clustered at ~90% accuracy[5], but for “frontier” reasoning tests (complex math, logic puzzles), Gemini 3 Pro has an edge with its “PhD-level” performance[10]. Code ...

Yahoo! Finance

finance.yahoo.com › news › anthropic-launches-claude-opus-45-as-googles-gemini-3-gains-big-backers-191645109.html

Anthropic launches Claude Opus 4.5 as Google's Gemini 3 gains big backers

November 24, 2025 - According to the company, Opus 4.5 beats out both Gemini 3 Pro and OpenAI's (OPAI.PVT) GPT-5.1-Codex-Max and GPT-5.1 in software engineering. Anthropic also said the model is capable of coming up with creative ways of solving problems.

Data Studios

datastudios.org › post › google-gemini-3-vs-claude-sonnet-4-5-full-report-and-comparison-of-features-capabilities-pricing

Google Gemini 3 vs. Claude Sonnet 4.5: Full Report and Comparison of Features, Capabilities, Pricing, and more

November 22, 2025 - It effectively can be left “on task” and keep making progress, only stopping when it’s truly done or if it hits a roadblock it can’t resolve on its own. This is a major improvement from earlier models (Claude’s previous version, Opus 4, could only manage ~7 hours autonomously before wandering off track or exhausting context).

The New Stack

thenewstack.io › home › anthropic’s new claude opus 4.5 reclaims the coding crown

Anthropic's New Claude Opus 4.5 Reclaims the Coding Crown - The New Stack

1 week ago - Anthropic today launched the latest version of its flagship Opus model: Opus 4.5. The company calls it its most intelligent model yet and notes that it is especially strong in solving coding tasks, taking the crown from OpenAI’s GPT-5.1-Codex-Max ...

Glbgpt

glbgpt.com › hub › gemini-3-pro-vs-claude45

Gemini 3 Pro vs Claude 4.5: I Tested Both for Coding – Here’s the Surprising Winner

November 20, 2025 - If you just want the short answer: for most real-world coding work today, Claude 4.5 is still the more reliable all‑around coding assistant, especially for complex reasoning, planning, and backend logic.

reddit.com › r/cursor › [discussion] is gemini 3.0 really better than claude sonnet 4.5/composer for coding?

r/cursor on Reddit: [DISCUSSION] Is Gemini 3.0 really better than Claude Sonnet 4.5/Composer for coding?

November 18, 2025 -

I've been switching back and forth between Claude Sonnet 4.5 or Composer 1 and Gemini 3.0 and I’m trying to figure out which model actually performs better for real-world coding tasks inside Cursor AI. I'm not looking for a general comparison.

I want feedback specifically in the context of how these models behave inside the Cursor IDE.

Top answer

1 of 53

wait two (metaphorical) minutes and someone (open AI or Anthropic) will respond with their equivalent model probably a cool release next week I bet

2 of 53

Its unquestionably of a different generation to the other models we have had access to

reddit.com › r/singularity › opus 4.5 benchmark results

r/singularity on Reddit: Opus 4.5 benchmark results

November 24, 2025 - What that means for projects depends on whether they're bumping against the limits of what AI can do; the increase in ability might represent opening doors that weren't possible to effectively do. If Gemini 3 manages, then it starts looking worse for reasons to choose Opus at the lower context size. That said, I've found in my work that Claude models are much better certain subtypes of long running tasks in ways the benchmarks don't show, particularly when it requires handling high ambiguity and autonomously seeking more information when avaliable data doesn't justify enough confidence.

reddit.com › r/claudeai › comparing gpt-5.1 vs gemini 3.0 vs opus 4.5 across 3 coding tasks. here's an overview

r/ClaudeAI on Reddit: Comparing GPT-5.1 vs Gemini 3.0 vs Opus 4.5 across 3 coding tasks. Here's an overview

November 26, 2025 -

Ran these three models through three real-world coding scenarios to see how they actually perform.

The tests:

Prompt adherence: Asked for a Python rate limiter with 10 specific requirements (exact class names, error messages, etc). Basically, testing if they follow instructions or treat them as "suggestions."

Code refactoring: Gave them a messy, legacy API with security holes and bad practices. Wanted to see if they'd catch the issues and fix the architecture, plus whether they'd add safeguards we didn't explicitly ask for.

System extension: Handed over a partial notification system and asked them to explain the architecture first, then add an email handler. Testing comprehension before implementation.

Results:

Test 1 (Prompt Adherence): Gemini followed instructions most literally. Opus stayed close to spec with cleaner docs. GPT-5.1 went defensive mode - added validation and safeguards that weren't requested.

Test 1 results

Test 2 (TypeScript API): Opus delivered the most complete refactoring (all 10 requirements). GPT-5.1 hit 9/10, caught security issues like missing auth and unsafe DB ops. Gemini got 8/10 with cleaner, faster output but missed some architectural flaws.

Test 2 results

Test 3 (System Extension): Opus gave the most complete solution with templates for every event type. GPT-5.1 went deep on the understanding phase (identified bugs, created diagrams) then built out rich features like CC/BCC and attachments. Gemini understood the basics but delivered a "bare minimum" version.

Test 3 results

Takeaways:

Opus was fastest overall (7 min total) while producing the most thorough output. Stayed concise when the spec was rigid, wrote more when thoroughness mattered.

GPT-5.1 consistently wrote 1.5-1.8x more code than Gemini because of JSDoc comments, validation logic, error handling, and explicit type definitions.

Gemini is cheapest overall but actually cost more than GPT in the complex system task - seems like it "thinks" longer even when the output is shorter.

Opus is most expensive ($1.68 vs $1.10 for Gemini) but if you need complete implementations on the first try, that might be worth it.

Full methodology and detailed breakdown here: https://blog.kilo.ai/p/benchmarking-gpt-51-vs-gemini-30-vs-opus-45

What's your experience been with these three? Have you run your own comparisons, and if so, what setup are you using?