[DISCUSSION] Is Gemini 3.0 really better than Claude Sonnet 4.5/Composer for coding?

reddit.com › r › cursor › comments › 1p0gr0s › discussion_is_gemini_30_really_better_than_claude

UI wise it's better than anything else out there by miles based on my testing. There's no competition when it comes to frontend. The benchmarks show Claude is a bit better on swe bench so that means their are some cases where Claude is the better candidate for your code. Answer from yaboyyoungairvent on reddit.com

reddit.com › r/claudeai › claude code-sonnet 4.5 >>>>>>> gemini 3.0 pro - antigravity

r/ClaudeAI on Reddit: Claude Code-Sonnet 4.5 >>>>>>> Gemini 3.0 Pro - Antigravity

November 22, 2025 -

Well, without rehashing the whole Claude vs. Codex drama again, we’re basically in the same situation except this time, somehow, the Claude Code + Sonnet 4.5 combo actually shows real strength.

I asked something I thought would be super easy and straightforward for Gemini 3.0 Pro.
I work in a fully dockerized environment, meaning every little Python module I have runs inside its own container, and they all share the same database. Nothing too complicated, right?

It was late at night, I was tired, and I asked Gemini 3.0 Pro to apply a small patch to one of the containers, redeploy it for me, and test the endpoint.
Well… bad idea. It completely messed up the DB container (no worries, I had backups even though it didn’t delete the volumes). It spun up a brand-new container, created a new database, and set a new password “postgres123”. Then it kept starting and stopping the module I had asked it to refactor… and since it changed the database, of course the module couldn’t connect anymore. Long story short: even with precise instructions, it failed, ran out of tokens, and hit the 5-hour limit.

So I reverted everything and asked Claude Code the exact same thing.
Five to ten minutes later: everything was smooth. No issues at all.
The refactor worked perfectly.

Conclusion:
Maybe everyone already knows this, but the best benchmarks even agentic ones are NOT good indicators of real-world performance. This all comes down to orchestration, and that’s exactly why so many companies like Factory.AI are investing heavily in this space.

Top answer

1 of 5

112

"Hey gang I have no idea how large data sets work but I assure you all that my anecdotal evidence from my single use case disproves lmarena.ai and the massive amount of input and evaluation there Yall are dumb. I get it and people who think like me do too. " Some contrarian cool guy every time the ebb and flow of progress shifts

2 of 5

Yea nothing beats Claude, other than its own limits

Composio

composio.dev › blog › claude-4-5-opus-vs-gemini-3-pro-vs-gpt-5-codex-max-the-sota-coding-model

Claude 4.5 Opus vs. Gemini 3 Pro vs. GPT-5-codex-max: The SOTA coding model - Composio

SWE-bench Verified: Opus 4.5 leads at 80.9%, followed by GPT 5.1 Codex-Max at 77.9% and Gemini 3 Pro at 76.2% Terminal-Bench 2.0: Gemini 3 Pro tops at 54.2%, demonstrating exceptional tool use capabilities · MMMU-Pro (Visual Reasoning): Gemini ...

Videos

59:43

YouTube

Reviewing Claude Opus 4.5 - YouTube

November 26, 2025

35:22

YouTube

Claude Opus 4.5 vs. Google Gemini 3: Design & build an app - YouTube

1 month ago

25:28

YouTube

Gemini 3 vs. Claude Opus 4.5 vs. GPT-5.1 Codex: Which AI model ...

4 weeks ago

15:38

YouTube

Gemini 3.0 Pro + Claude Opus 4.5 = The Ultimate AI Coding Workflow!

1 month ago

YouTube

I Tested Gemini 3 Pro vs Claude Opus 4.5 for UI Design (one is ...

2 weeks ago

12:21

YouTube

Gemini 3 vs Claude Opus 4.5 vs GPT 5.1 — What’s Actually ...

3 weeks ago

View all

Getpassionfruit

getpassionfruit.com › blog › gpt-5-1-vs-claude-4-5-sonnet-vs-gemini-3-pro-vs-deepseek-v3-2-the-definitive-2025-ai-model-comparison

GPT 5.1 vs Claude 4.5 vs Gemini 3: 2025 AI Comparison

Replit reports Claude achieved 0% error rate on their internal code editing benchmark (down from 9% on Sonnet 4). Gemini 3 Pro dominates algorithmic and competitive programming with a 2,439 LiveCodeBench Elo and Grandmaster-tier Codeforces rating.

Vertu

vertu.com › best post › gpt-5.2 codex vs gemini 3 pro vs claude opus 4.5: coding comparison guide

AI Coding Benchmarks 2025: Gemini 3 Pro vs GPT-5.2 vs Claude 4.5

4 days ago - Gemini 3 Pro emerged as the surprise leader for frontend development, combining superior visual quality with the lowest costs. GPT-5.2 Codex proved itself as the most reliable all-rounder, delivering consistent results across diverse coding challenges. Claude Opus 4.5's poor performance in ...

CometAPI

cometapi.com › gemini-3-pro-vs-claude-4-5-sonnet-for-coding

Gemini 3 Pro vs Claude 4.5 Sonnet for Coding: Which is Better in 2025 - CometAPI - All AI Models in One API

3 weeks ago - Claude Sonnet 4.5: optimized specifically for agentic workflows and code: Anthropic emphasizes instruction-following,tool reliability, edit/correction proficiency, and long horizon state management.

Glbgpt

glbgpt.com › hub › gemini-3-pro-vs-claude45

Gemini 3 Pro vs Claude 4.5: I Tested Both for Coding – Here’s the Surprising Winner

... Feels more aligned with the ... expected: analyzed, asked questions, and waited. Gemini 3 Pro tended to start writing code anyway, ignoring the “no code yet” part....

Jduncan

jduncan.io › blog › 2025-11-20-google-antigravity-gemini-3-first-impressions

Gemini 3 Pro vs Claude Sonnet 4.5: Antigravity IDE Review

Claude still edges out Gemini on SWE-Bench Verified testing (77.2% vs 76.2%), but Gemini wins on most other coding benchmarks. My take: Gemini 3 is a massive improvement over Gemini 2.5, and puts itself squarely into the discussion with Sonnet ...

Data Studios

datastudios.org › post › google-gemini-3-vs-claude-sonnet-4-5-coding-comparison-overview

Google Gemini 3 vs Claude Sonnet 4.5: Coding Comparison Overview

3 weeks ago - Google Gemini 3 and Claude Sonnet ... with Gemini focusing on speed, multimodal flexibility, and agentic workflows and Claude prioritizing correctness, structured reasoning, and production-grade code reliability.Their differences ...

Find elsewhere

Google Bing Mojeek

Skywork

skywork.ai › home › gemini 3 vs claude 4.5: honest comparison for developers

Gemini 3 vs Claude 4.5: Honest Comparison for Developers - Skywork ai

November 20, 2025 - Claude 4.5 found more edge cases, wrote clearer refactors, and designed broader tests that uncovered hidden bugs. If shipping production code with fewer retries matters most, Claude 4.5 felt steadier; for speed, Gemini 3 wins.

reddit.com › r/cursor › [discussion] is gemini 3.0 really better than claude sonnet 4.5/composer for coding?

r/cursor on Reddit: [DISCUSSION] Is Gemini 3.0 really better than Claude Sonnet 4.5/Composer for coding?

November 18, 2025 -

I've been switching back and forth between Claude Sonnet 4.5 or Composer 1 and Gemini 3.0 and I’m trying to figure out which model actually performs better for real-world coding tasks inside Cursor AI. I'm not looking for a general comparison.

I want feedback specifically in the context of how these models behave inside the Cursor IDE.

Top answer

1 of 53

wait two (metaphorical) minutes and someone (open AI or Anthropic) will respond with their equivalent model probably a cool release next week I bet

2 of 53

Its unquestionably of a different generation to the other models we have had access to

reddit.com › r/geminiai › comparing claude opus 4.5 vs gpt-5.1 vs gemini 3 - coding task

r/GeminiAI on Reddit: Comparing Claude Opus 4.5 vs GPT-5.1 vs Gemini 3 - Coding Task

1 month ago -

I Ran all three models for a coding task just to see how they behave when things aren’t clean or nicely phrased.

The goal was just to see who performs like a real dev.

here's my takeaway

Opus 4.5 handled real repo-issues the best. It fixed things without breaking unrelated parts and didn’t hallucinate new abstractions. Felt the most “engineering-minded

GPT-5.1 was close behind. It explained its reasoning step-by-step and sometimes added improvements I never asked for. Helpful when you want safety, annoying when you want precision

Gemini solved most tasks but tended to optimize or simplify decisions I explicitly constrained. Good output, but sometimes too “creative.”

On Refactoring and architecture-level tasks:
Opus delivered the most complete refactor with consistent naming, updated dependencies, and documentation.
GPT-5.1 took longer because it analyzed first, but the output was maintainable and defensive.
Gemini produced clean code but missed deeper security and design patterns.

Context windows (because it matters at repo scale):

Opus 4.5: ~200K tokens usable, handles large repos better without losing track
GPT-5.1: ~128K tokens but strong long-reasoning even near the limit
Gemini 3 Pro: ~1M tokens which is huge, but performance becomes inconsistent as input gets massive

What's your experience been with these three? Used these frontier models Side by Side in my Multi Agent AI setup with Anannas LLM Provider & the results were interesting.

Have you run your own comparisons, and if so, what setup are you using?

Top answer

1 of 5

This is the kind of benchmark which actually matter for builde, not synthetic tests

2 of 5

Intresting that gemini's huge context doesnt translate to stable performance

reddit.com › r/chatgptcoding › i tested claude 4.5, gpt-5.1 codex, and gemini 3 pro on real code (not benchmarks)

r/ChatGPTCoding on Reddit: I tested Claude 4.5, GPT-5.1 Codex, and Gemini 3 Pro on real code (not benchmarks)

1 month ago -

Three new coding models dropped almost at the same time, so I ran a quick real-world test inside my observability system. No playground experiments, I had each model implement the same two components directly in my repo:

Statistical anomaly detection (EWMA, z-scores, spike detection, 100k+ logs/min)
Distributed alert deduplication (clock skew, crashes, 5s suppression window)

Here’s the simplified summary of how each behaved.

Claude 4.5

Super detailed architecture, tons of structure, very “platform rewrite” energy.
But one small edge case (Infinity.toFixed) crashed the service, and the restored state came back corrupted.
Great design, not immediately production-safe.

GPT-5.1 Codex

Most stable output.
Simple O(1) anomaly loop, defensive math, clean Postgres-based dedupe with row locks.
Integrated into my existing codebase with zero fixes required.

Gemini 3 Pro

Fastest output and cleanest code.
Compact EWMA, straightforward ON CONFLICT dedupe.
Needed a bit of manual edge-case review but great for fast iteration.

TL;DR

Model	Cost	Time	Notes
Gemini 3 Pro	$0.25	~5-6 mins	Very fast, clean
GPT-5.1 Codex	$0.51	~5-6 mins	Most reliable in my tests
Claude Opus 4.5	$1.76	~12 mins	Strong design, needs hardening

I also wired Composio’s tool router in one branch for Slack/Jira/PagerDuty actions, which simplified agent-side integrations.

Not claiming any “winner", just sharing how each behaved inside a real codebase.

If you want to know more, check out the Complete analysis: Read the full blog post

Top answer

1 of 5

I have a hard time believing codex was twice as fast as opus. Unless it was something simple. It’s usually the slowest option for me by far

2 of 5

No Claude Sonnet 4.5? I would maybe pick Opus for design but Sonnet for implementation.

Clarifai

clarifai.com › home › gemini 3.0 vs gpt-5.1 vs claude 4.5 vs grok 4.1: ai model comparison

Gemini 3.0 vs GPT-5.1 vs Claude 4.5 vs Grok 4.1: AI Model Comparison

2 weeks ago - Software Development: For long coding sessions and bug fixing, pick Claude 4.5; for algorithm design, Gemini 3; for quick iterations with safe patches, GPT‑5.1. Business Strategy & Planning: Use Gemini 3 for long‑horizon simulations and ...

Medium

medium.com › ai-software-engineer › i-tested-claude-opus-4-5-vs-gemini-3-pro-close-but-a-clear-winner-surprised-me-1cb6e2cd601d

I Tested Claude Opus 4.5 vs Gemini 3 Pro ( Close But a Clear Winner) | by Joe Njenga | AI Software Engineer | Dec, 2025 | Medium

3 weeks ago - Just when we thought Gemini 3 Pro had become the coding king, Claude Opus 4.5 dropped and dethroned it. This is a complete test comparison

Vertu

vertu.com › best post › gemini 3 launch: google strikes back less than a week after gpt-5.1 release

Gemini 3 vs. GPT-5.1 vs. Claude 4.5: Benchmarks Reveal Google’s New AI Leads in Reasoning & Code

November 20, 2025 - While Claude Sonnet 4.5 maintains ... coding model” claim. The truth becomes nuanced: Claude excels at debugging existing code; Gemini 3 dominates novel algorithm creation....

Bind AI IDE

blog.getbind.co › 2025 › 12 › 12 › gpt-5-2-vs-claude-opus-4-5-vs-gemini-3-0-pro-which-one-is-best-for-coding

GPT-5.2 Vs Claude Opus 4.5 Vs Gemini 3.0 Pro – Which One Is Best For Coding?

2 weeks ago - Opus 4.5’s efficiency gains (using fewer tokens for equivalent results) can make it competitive despite higher per-token costs. Gemini 3 Pro’s batch processing options provide up to 50% savings for non-time-sensitive requests. Choosing the right model depends heavily on your development priorities and workflow patterns. Consider these scenarios: For enterprise teams maintaining large legacy codebases, Claude Opus 4.5’s combination of accuracy, context understanding, and low error rates makes it the safest choice despite higher costs.

TechRadar

techradar.com › ai platforms & assistants

I tested Gemini 3, ChatGPT 5.1, and Claude Sonnet 4.5 – and Gemini crushed it in a real coding task | TechRadar

November 18, 2025 - Claude, in particular, impressed me with its prompt-driven coding skills, what many are now calling "Vibe Coding," where instead of writing code, you just tell the AI what you want – vibing with the AI results – nudging it along with subsequent prompts to get the final code you want. For my latest gaming project, I started with Gemini 3 Pro but also fed the same prompt to ChatGPT 5.1 and Claude Sonet 4.5.

Vertu

vertu.com › best post › gemini 3 flash vs claude sonnet 4.5: artificial analysis reveals the winner

Gemini 3 Flash vs Claude Sonnet 4.5: The 2025 Artificial Analysis Winner

2 weeks ago - Gemini 3 Flash beats Claude Sonnet 4.5 with a 71.3 Intelligence score. Discover why it’s 3x faster and 83% cheaper for developers. See the 2025 benchmarks now!