UI wise it's better than anything else out there by miles based on my testing. There's no competition when it comes to frontend. The benchmarks show Claude is a bit better on swe bench so that means their are some cases where Claude is the better candidate for your code. Answer from yaboyyoungairvent on reddit.com
🌐
Reddit
reddit.com › r/cursor › [discussion] is gemini 3.0 really better than claude sonnet 4.5/composer for coding?
r/cursor on Reddit: [DISCUSSION] Is Gemini 3.0 really better than Claude Sonnet 4.5/Composer for coding?
November 18, 2025 -

I've been switching back and forth between Claude Sonnet 4.5 or Composer 1 and Gemini 3.0 and I’m trying to figure out which model actually performs better for real-world coding tasks inside Cursor AI. I'm not looking for a general comparison.

I want feedback specifically in the context of how these models behave inside the Cursor IDE.

🌐
CometAPI
cometapi.com › gemini-3-pro-vs-claude-4-5-sonnet-for-coding
Gemini 3 Pro vs Claude 4.5 Sonnet for Coding: Which is Better in 2025 - CometAPI - All AI Models in One API
3 weeks ago - Google frames Gemini 3 Pro as the “most intelligent” in the family, optimized for wide tasks beyond code (though agentic coding is a priority). Claude Sonnet 4.5: optimized specifically for agentic workflows and code: Anthropic emphasizes ...
Discussions

Claude Code-Sonnet 4.5 >>>>>>> Gemini 3.0 Pro - Antigravity
"Hey gang I have no idea how large data sets work but I assure you all that my anecdotal evidence from my single use case disproves lmarena.ai and the massive amount of input and evaluation there Yall are dumb. I get it and people who think like me do too. " Some contrarian cool guy every time the ebb and flow of progress shifts More on reddit.com
🌐 r/ClaudeAI
135
280
November 22, 2025
It seems opus 4.5 is just too amazing even compared to gemini 3
They're both great SOTA models and have their use cases, i feel like Claude is better for agentic coding and Gemini is better for multimodality and it's cheaper. More on reddit.com
🌐 r/Bard
131
356
November 24, 2025
Is it just me or is Claude 4.5 better than Gemini Pro 3 on Antigravity
Gemini 3 pro is pretty much unusable on Antigravity unfortunately. I think it works better in other tools which is strange. More on reddit.com
🌐 r/Bard
30
18
1 month ago
Have some fun with Gemini 3 vs Sonnet 4.5
Opus > sonnet lol, dunno how yall think sonnet is better More on reddit.com
🌐 r/ClaudeAI
38
26
November 21, 2025
🌐
AceCloud
acecloud.ai › blog › claude-opus-4-5-vs-gemini-3-pro-vs-sonnet-4-5
Claude Opus 4.5 Vs Gemini 3 Pro Vs Sonnet 4.5 Comparison Guide
November 25, 2025 - Pick Gemini 3 Pro if you need very strong multimodal performance, a 1M-token context window by default, and tight integration with Google tools and Search. Pick Claude Opus 4.5 if you care most about frontier coding performance, deep reasoning ...
🌐
Getpassionfruit
getpassionfruit.com › blog › gpt-5-1-vs-claude-4-5-sonnet-vs-gemini-3-pro-vs-deepseek-v3-2-the-definitive-2025-ai-model-comparison
GPT 5.1 vs Claude 4.5 vs Gemini 3: 2025 AI Comparison
1 month ago - Gemini 3 Pro leads overall reasoning benchmarks with an unprecedented 1501 LMArena Elo, becoming the first model to break the 1500 barrier, while Claude 4.5 Sonnet dominates real-world coding at 77.2% SWE-bench and DeepSeek-V3.2 delivers ...
🌐
Clarifai
clarifai.com › home › gemini 3.0 vs gpt-5.1 vs claude 4.5 vs grok 4.1: ai model comparison
Gemini 3.0 vs GPT-5.1 vs Claude 4.5 vs Grok 4.1: AI Model Comparison
3 weeks ago - By late 2025, a new generation of large‑language models (LLMs) has appeared that pushes the boundaries of reasoning, context memory and emotional intelligence. Google’s Gemini 3.0 Pro, OpenAI’s GPT‑5.1, Anthropic’s Claude Sonnet 4.5 and xAI’s Grok 4.1 represent the cutting edge.
🌐
X
x.com › sdrzn › status › 1990886120654344300
Gemini 3 Pro is the best of Claude Sonnet 4.5 (coding ...
Gemini 3 Pro is the best of Claude Sonnet 4.5 (coding, agentic thinking) and Gemini 2.5 Pro (actually handles 1m context well). It felt like model improvements got linear seeing how the jump from Sonnet 3.7 → 4 and GPT 4 .1 → 5 felt, but ...
🌐
Data Studios
datastudios.org › post › google-gemini-3-vs-claude-sonnet-4-5-full-report-and-comparison-of-features-capabilities-pricing
Google Gemini 3 vs. Claude Sonnet 4.5: Full Report and Comparison of Features, Capabilities, Pricing, and more
November 22, 2025 - Both models push the boundaries of what AI can do, but they come with different strengths and design philosophies. This report provides a comprehensive comparison across key dimensions: from raw reasoning prowess and coding skills to multimodal ...
Find elsewhere
🌐
Glbgpt
glbgpt.com › hub › gemini-3-pro-vs-claude45
Gemini 3 Pro vs Claude 4.5: I Tested Both for Coding – Here’s the Surprising Winner
November 20, 2025 - In other words, Gemini 3 Pro feels like a very powerful but sometimes unpredictable senior engineer: brilliant at certain tasks, but you have to supervise it closely. Claude 4.5 (especially the Sonnet variant) has built a reputation as one of ...
🌐
Jduncan
jduncan.io › blog › 2025-11-20-google-antigravity-gemini-3-first-impressions
Gemini 3 Pro vs Claude Sonnet 4.5: Antigravity IDE Review
November 20, 2025 - TechRadar ran a comparison where Gemini 3 Pro built a working Progressive Web App with keyboard controls without being asked. Claude struggled with the same prompt. The benchmark data backs this up. Gemini 3 Pro scored 2,439 on LiveCodeBench Pro compared to Claude Sonnet 4.5’s 1,418.
🌐
Simtheory
simtheory.ai › models › compare › gemini-3-pro-preview › vs › claude-4.5-sonnet
Gemini 3 Pro vs Claude 4.5 Sonnet: AI Model Comparison | Simtheory
November 18, 2025 - Gemini 3 Pro: Google's most advanced AI model combining breakthrough reasoning depth, native multimodal understanding, and state-of-the-art agentic capabilities to help you learn, build, and plan anything · The latest Claude 4.5 Sonnet model ...
🌐
The Algorithmic Bridge
thealgorithmicbridge.com › p › google-gemini-3-just-killed-every
Google Gemini 3 Is the Best Model Ever. One Score Stands Out Above the Rest
November 18, 2025 - Gemini 3 Pro earned ~$5.5k on Vending-Bench 2, the vending machine benchmark (it tries to answer a valuable real-world question: Can AI models run a profitable business across long horizons?), compared to ~$3.8k from Sonnet 4.5.
🌐
TechRadar
techradar.com › ai platforms & assistants
I tested Gemini 3, ChatGPT 5.1, and Claude Sonnet 4.5 – and Gemini crushed it in a real coding task | TechRadar
November 18, 2025 - Also: Make the ring look a bit ... · Gemini 3 Pro generated a Version 2.0 with, among other updates, "CSS Perspective to tilt the [ring] floor," and drama in the form of "the whole camera shakes when a heavy hit lands."...
🌐
Reddit
reddit.com › r/claudeai › claude code-sonnet 4.5 >>>>>>> gemini 3.0 pro - antigravity
r/ClaudeAI on Reddit: Claude Code-Sonnet 4.5 >>>>>>> Gemini 3.0 Pro - Antigravity
November 22, 2025 -

Well, without rehashing the whole Claude vs. Codex drama again, we’re basically in the same situation except this time, somehow, the Claude Code + Sonnet 4.5 combo actually shows real strength.

I asked something I thought would be super easy and straightforward for Gemini 3.0 Pro.
I work in a fully dockerized environment, meaning every little Python module I have runs inside its own container, and they all share the same database. Nothing too complicated, right?

It was late at night, I was tired, and I asked Gemini 3.0 Pro to apply a small patch to one of the containers, redeploy it for me, and test the endpoint.
Well… bad idea. It completely messed up the DB container (no worries, I had backups even though it didn’t delete the volumes). It spun up a brand-new container, created a new database, and set a new password “postgres123”. Then it kept starting and stopping the module I had asked it to refactor… and since it changed the database, of course the module couldn’t connect anymore. Long story short: even with precise instructions, it failed, ran out of tokens, and hit the 5-hour limit.

So I reverted everything and asked Claude Code the exact same thing.
Five to ten minutes later: everything was smooth. No issues at all.
The refactor worked perfectly.

Conclusion:
Maybe everyone already knows this, but the best benchmarks even agentic ones are NOT good indicators of real-world performance. This all comes down to orchestration, and that’s exactly why so many companies like Factory.AI are investing heavily in this space.

🌐
Reddit
reddit.com › r/bard › it seems opus 4.5 is just too amazing even compared to gemini 3
r/Bard on Reddit: It seems opus 4.5 is just too amazing even compared to gemini 3
November 24, 2025 - I was testing Gemini 3 Pro and Sonnet 4.5 side by side yesterday, and to my shock, Sonnet 4.5 is a lot better on instructions following, creativity, and doesn't hallucinate as much.
🌐
Reddit
reddit.com › r/bard › is it just me or is claude 4.5 better than gemini pro 3 on antigravity
r/Bard on Reddit: Is it just me or is Claude 4.5 better than Gemini Pro 3 on Antigravity
1 month ago -

Gemini 3 Pro is quite slow and keeps making more errors compared to Claude Sonnet 4.5 on Antigravity. It was fine at the start, but the more I used it, it is creating malformed edits and isn't able to even edit a single file?

I don't know if this is a bug or whether it's just that bad. Is anyone else facing problems?

Edit: FYI, I'm experiencing this both on the Low and High version on Fast. It is SO slow. It is taking up to few minutes just to give me an initial response.

🌐
Amp
ampcode.com › news › gemini-3
Gemini 3 Pro - Amp
November 18, 2025 - Gemini 3 checked off all the boxes that so far only Claude had checked: smart, fast, follows instructions very well, works hand-in-hand with the user if needed, very eager to use tools and uses them with high dexterity.
🌐
Bind AI IDE
blog.getbind.co › 2025 › 11 › 19 › gemini-3-0-vs-gpt-5-1-vs-claude-sonnet-4-5-which-one-is-better
Gemini 3.0 vs GPT-5.1 vs Claude Sonnet 4.5: Which one is better? – Bind AI IDE
November 19, 2025 - If your workflow involves dumping an entire large repository or very long document into a single prompt, Gemini 3 Pro is currently the safest bet for reliable performance at 1M+ tokens.
🌐
Reddit
reddit.com › r/claudeai › have some fun with gemini 3 vs sonnet 4.5
r/ClaudeAI on Reddit: Have some fun with Gemini 3 vs Sonnet 4.5
November 21, 2025 -

Have some fun with Gemini 3. Im pretty sure google trained the model to tell you it is the smartest model with the most raw compute (Hard coded)

You can basically get it to prove itself wrong a few different ways, then have it compare itself to another model and it will always say it has more 'raw' compute or is smarter and say it failed because of XYZ - kind of like when you ask claude about who is dario you get about dario the CEO (they stuck it in the system prompt).

I had it come up with some instructions which included writing a python script, it then proceeded to import a non existent library and then when a different model actually fixed this it Gemini claimed that it had more 'raw' compute

I had my Sonnet 4.5 + memory pretty consistently beat Gemini

TLDR: Gemini 3 SOUNDS great, has a lot of the agentic capabilities where it reads the 'intent' of a users query and less the literal meaning, but its actual brain is not as big of a leap as I thought it would be

What have people found?

Sonnet > Gemini > Opus > Gpt 5.1 in terms of long running ability on complex tasks

Claude Code for me still beating Gemini/AntiGravity (also note that if you are using the research preview of Anti Gravity its entirely possible that they take ALL of your data for training purposes, read the terms of service carefully)

🌐
Reddit
reddit.com › r/claudeai › comparing gpt-5.1 vs gemini 3.0 vs opus 4.5 across 3 coding tasks. here's an overview
r/ClaudeAI on Reddit: Comparing GPT-5.1 vs Gemini 3.0 vs Opus 4.5 across 3 coding tasks. Here's an overview
November 26, 2025 -

Ran these three models through three real-world coding scenarios to see how they actually perform.

The tests:

Prompt adherence: Asked for a Python rate limiter with 10 specific requirements (exact class names, error messages, etc). Basically, testing if they follow instructions or treat them as "suggestions."

Code refactoring: Gave them a messy, legacy API with security holes and bad practices. Wanted to see if they'd catch the issues and fix the architecture, plus whether they'd add safeguards we didn't explicitly ask for.

System extension: Handed over a partial notification system and asked them to explain the architecture first, then add an email handler. Testing comprehension before implementation.

Results:

Test 1 (Prompt Adherence): Gemini followed instructions most literally. Opus stayed close to spec with cleaner docs. GPT-5.1 went defensive mode - added validation and safeguards that weren't requested.

Test 1 results

Test 2 (TypeScript API): Opus delivered the most complete refactoring (all 10 requirements). GPT-5.1 hit 9/10, caught security issues like missing auth and unsafe DB ops. Gemini got 8/10 with cleaner, faster output but missed some architectural flaws.

Test 2 results

Test 3 (System Extension): Opus gave the most complete solution with templates for every event type. GPT-5.1 went deep on the understanding phase (identified bugs, created diagrams) then built out rich features like CC/BCC and attachments. Gemini understood the basics but delivered a "bare minimum" version.

Test 3 results

Takeaways:

Opus was fastest overall (7 min total) while producing the most thorough output. Stayed concise when the spec was rigid, wrote more when thoroughness mattered.

GPT-5.1 consistently wrote 1.5-1.8x more code than Gemini because of JSDoc comments, validation logic, error handling, and explicit type definitions.

Gemini is cheapest overall but actually cost more than GPT in the complex system task - seems like it "thinks" longer even when the output is shorter.

Opus is most expensive ($1.68 vs $1.10 for Gemini) but if you need complete implementations on the first try, that might be worth it.

Full methodology and detailed breakdown here: https://blog.kilo.ai/p/benchmarking-gpt-51-vs-gemini-30-vs-opus-45

What's your experience been with these three? Have you run your own comparisons, and if so, what setup are you using?