Gemini 3.0 Pro has better performance than any model OpenAI has released so far.
https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-Pro-Model-Card.pdf
Gemini 3.0 Pro vs ChatGPT 5.1 (Thinking) on Visual Logic: A Side-by-Side Stress Test (The results surprised me)
ChatGPT is the better product compared to Gemini
[deleted by user]
SVG Benchmark: Grok vs Gemini vs ChatGPT vs Claude
Gemini vs. ChatGPT: Which is better?
ChatGPT vs Gemini: Which is more accurate?
ChatGPT vs Gemini: Which has better integrations?
Videos
There is a lot of noise right now about "reasoning" models, so I decided to skip the standard benchmarks and run a practical visual logic stress test.
I fed both models (Gemini 3.0 Pro and ChatGPT 5.1 Thinking) three "trick" images designed to confuse standard multimodal vision. The goal was to test observation (what is actually there?) vs. hallucination (what the model expects to be there).
The gap in performance was much wider than I expected.
Test 1: The "AI Hand" Count I started with a classic AI-generated image with clear artifacts (7 fingers).
The Verdict:
ChatGPT 5.1 (Thinking): Failed hard. It confidently hallucinated a normal hand: "It is simply an open hand... with five extended fingers." It saw what a hand should look like, ignoring the visual reality.
Gemini 3.0 Pro: Immediately flagged the anomaly. "Based on a quick count, that hand appears to have seven fingers*."* It even correctly identified the context as the "AI Hand Phenomenon."
Test 2: The Negative Space / Semantics Next, I used the "Cheese Font" image, which requires reading negative space—a notorious weak p
Test 3: The Wobbly Table Physics Finally, a logic puzzle involving a table with uneven legs (Leg A is the longest). The question implies asking about stability.
The Verdict:
ChatGPT 5.1: Gave a probabilistic, "fuzzy" answer (assigning 75% probability to legs seemingly at random). It tried to "guess" the statistics rather than solving the physical constraints.
Gemini 3.0: Applied actual spatial reasoning. It deduced that the table would essentially rest on the longest leg (A) and the diagonal opposite, identifying exactly the geometry of the wobble.
My Takeaway: ChatGPT seems to be "thinking" fast but looking superficially. It hallucinates normality where there is none. Gemini 3.0 Pro, in this specific test, demonstrated actual grounded reasoning. It didn't just tag the image; it analyzed the physics and anomalies correctly.
Has anyone else noticed Gemini outperforming the "Thinking" models in multimodal tasks recently? Or did I just hit a specific weakness in GPT's vision encoder?
I see many benchmarks showing Gemini 2.5 leading the AI race but I the responses from ChatGPT, even the base 4o model, is much better compared to Gemini. The auto management of memory, the layout of the responses, the app design, etc. is just better. My experience is that Gemini maybe a better model for some use cases but ChatGPT is the better product for most use cases. I use both and I always prefer the responses and the overall experience of ChatGPT. I’m a senior software engineer and so I mostly use ChatGPT beyond coding for system design, architecture, etc., and ChatGPT is just a pleasure to work with and converse like a pair programmer. I also like how ChatGPT automatically connects to the web when it knows I asked something about current events. Only the base Gemini 2.0 flash can do this at the moment.