Videos
Google has dropped the full multimodal/vision benchmarks for Gemini 3 Pro.
Key Takeaways (from the chart):
Visual Reasoning (MMMU Pro): Gemini 3 hits 81.0% beating GPT-5.1 (76%) and Opus 4.5 (72%).
Video Understanding: It completely dominates in procedural video (YouCook2), scoring 222.7 vs GPT-5.1's 132.4.
Spatial Reasoning: In 3D spatial understanding (CV-Bench), it holds a massive lead (92.0%).
This Vision variant seems optimized specifically for complex spatial and video tasks, which explains the massive gap in those specific rows.
Official 🔗 : https://blog.google/technology/developers/gemini-3-pro-vision/
I Ran all three models for a coding task just to see how they behave when things aren’t clean or nicely phrased.
The goal was just to see who performs like a real dev.
here's my takeaway
Opus 4.5 handled real repo-issues the best. It fixed things without breaking unrelated parts and didn’t hallucinate new abstractions. Felt the most “engineering-minded
GPT-5.1 was close behind. It explained its reasoning step-by-step and sometimes added improvements I never asked for. Helpful when you want safety, annoying when you want precision
Gemini solved most tasks but tended to optimize or simplify decisions I explicitly constrained. Good output, but sometimes too “creative.”
On Refactoring and architecture-level tasks:
Opus delivered the most complete refactor with consistent naming, updated dependencies, and documentation.
GPT-5.1 took longer because it analyzed first, but the output was maintainable and defensive.
Gemini produced clean code but missed deeper security and design patterns.
Context windows (because it matters at repo scale):
Opus 4.5: ~200K tokens usable, handles large repos better without losing track
GPT-5.1: ~128K tokens but strong long-reasoning even near the limit
Gemini 3 Pro: ~1M tokens which is huge, but performance becomes inconsistent as input gets massive
What's your experience been with these three? Used these frontier models Side by Side in my Multi Agent AI setup with Anannas LLM Provider & the results were interesting.
Have you run your own comparisons, and if so, what setup are you using?
Gemini 3 Pro is quite slow and keeps making more errors compared to Claude Sonnet 4.5 on Antigravity. It was fine at the start, but the more I used it, it is creating malformed edits and isn't able to even edit a single file?
I don't know if this is a bug or whether it's just that bad. Is anyone else facing problems?
Edit: FYI, I'm experiencing this both on the Low and High version on Fast. It is SO slow. It is taking up to few minutes just to give me an initial response.
Google just released their full breakdown for the new Gemini 3 Pro Vision model. Interestingly, they have finally included Claude Opus 4.5 in the direct comparison, acknowledging it as the standard to beat.
The Data (from the chart):
Visual Reasoning: Opus 4.5 holds its own at 72.0% (MMMU Pro), sitting right between the GPT class and the new Gemini.
Video Understanding: While Gemini spikes in YouCook2 (222.7), Opus 4.5 (145.8) actually outperforms GPT-5.1 (132.4) in procedural video understanding.
The Takeaway: Google is clearly viewing the Opus 4.5 as a key benchmark alongside GPT-5 series.
Note: Posted per request to discuss how Claude's vision capabilities stack up against the new Google architecture.
Source:Google Keyword
🔗: https://blog.google/technology/developers/gemini-3-pro-vision/