Brave Search

reddit.com › r/chatgptcoding › anyone else feel let down by claude 4.

r/ChatGPTCoding on Reddit: Anyone else feel let down by Claude 4.

April 18, 2025 -

The 200k context window is deflating especially when gpt and gemini are eating them for lunch. Even if they went to 500k would be better.

Benchmarks at this point in the A.I game are negligible at best and you sure don't "Feel" a 1% difference between the 3. It feels like we are getting to the point of diminishing returns.

Us as programmers should be able to see the forest from the trees here. We think differently than the normal person. We think outside of the box. We don't get caught in hype as we exist in the realm of research, facts and practicality.

This Claude release is more hype than practical.

Top answer

1 of 5

Been playing with it a lot. I have the Max plan so using both opus with claude code and sonnet in chat. Been very impressed. Feels like a biggg jump from 3.7....I also sub to GPT and Google. And it feels way better than gemini 2.5 pro rn. And def better than gpt for complex tasks, coding, and writing (although I still use 4o and 4.1 the most for casual interactions, questions, quick brainstorming). Its really really impressing me with claude code. 3.7 was great and this feels like a decent jump. Not getting hung up nearly as much. Just my 2c

2 of 5

No chance I'm paying 5X for 1/5 the context window.

reddit.com › r/claudeai › claude opus 4 and claude sonnet 4 officially released

r/ClaudeAI on Reddit: Claude Opus 4 and Claude Sonnet 4 officially released

May 22, 2025 -

Source: Code with Claude Opening Keynote

Top answer

1 of 5

398

we’ve significantly reduced behavior where the models use shortcuts or loopholes to complete tasks. Both models are 65% less likely to engage in this behavior than Sonnet 3.7 on agentic tasks that are particularly susceptible to shortcuts and loopholes. This is a very welcome improvement.

2 of 5

191

“Opus consumes usage limits faster than other models” Although it’s well-known, seeing this explicitly written out makes me kinda nervous for usage limits

Videos

reddit.com

Claude 4 - System Card Review : r/ClaudeAI

November 2, 2024

reddit.com

r/ChatGPTCoding on Reddit: Claude 4.0 is coming with “substantial ...

July 6, 2024

reddit.com

r/singularity on Reddit: "We are reserving Claude 4 Sonnet...for ...

October 7, 2024

reddit.com

Claude 4 Explained : r/vibecoding

reddit.com

r/ClaudeAI on Reddit: I Coded My Game Solely Using Claude

July 17, 2025

19:14

YouTube

Claude 4 Is Finally Here - And I Pushed It to the Limit - YouTube

May 22, 2025

View all

reddit.com › r/claudeai › claude 4 models are absolute beasts for web development

r/ClaudeAI on Reddit: Claude 4 models are absolute beasts for web development

May 23, 2025 -

Been using these tools for the last few years. Can already tell opus and sonnet 4 have set a completely new benchmark, especially using Claude Code.

They just work, less hallucination, less infinite loops of confusion. You can set it off and come back with a 80-90% confidence it’s done what you asked. Maximum 3-4 iterations to get website/app component styling perfect (vs 5-10 before).

I’ve already seen too many of the classic ‘omg this doesn’t work for me they suck, overhyped’ posts. Fair enough if that’s your experience, but I completely disagree and can’t help but think your prompting is the problem.

Without using too much stereotypical AI hyperbole, I think these are the biggest step change since GPT 3.

Top answer

1 of 5

It’s amazing to me what a difference understanding both software engineering and promoting makes to the whole experience. I find if I clearly define my requirements, give hints about what I suspect the cause might be for an issue, and act like a technical PM, Claude Code is just hands down the best coding agent on the market right now and with 4 Opus I’m just blown away by what it’s capable of. If you spin it up in a VM and pass in the —dangerously-skip-permissions flag it can independently work on some hard problems for a looong time without intervention. (I wouldn’t recommend using the flag within your actual OS though.) It is wild how much opinions on it seem to differ though. Sometimes I read comments that make me feel like we must be using different models.

2 of 5

It’s been trash imo. Sonnet four tries to recreate things instead of just making the changes asked. It also has an issue with editing existing files where it throws itself into a loop and then decides to create a powershell script To edit the file it has.

reddit.com › r/claudeai › introducing claude 4

r/ClaudeAI on Reddit: Introducing Claude 4

April 20, 2025 -

Today, Anthropic is introducing the next generation of Claude models: Claude Opus 4 and Claude Sonnet 4, setting new standards for coding, advanced reasoning, and AI agents. Claude Opus 4 is the world’s best coding model, with sustained performance on complex, long-running tasks and agent workflows. Claude Sonnet 4 is a drop-in replacement for Claude Sonnet 3.7, delivering superior coding and reasoning while responding more precisely to your instructions.

Claude Opus 4 and Sonnet 4 are hybrid models offering two modes: near-instant responses and extended thinking for deeper reasoning. Both models can also alternate between reasoning and tool use—like web search—to improve responses.

Both Claude 4 models are available today for all paid plans. Additionally, Claude Sonnet 4 is available on the free plan.

Read more here: https://www.anthropic.com/news/claude-4

Top answer

1 of 5

Here's benchmarks Benchmark | Claude Opus 4 | Claude Sonnet 4 | Claude Sonnet 3.7 | OpenAI o3 | OpenAI GPT-4.1 | Gemini 2.5 Pro (Preview 05-06) | Agentic coding (SWE-bench Verified 1,5) | 72.5% / 79.4% | 72.7% / 80.2% | 62.3% / 70.3% | 69.1% | 54.6% | 63.2% | Agentic terminal coding (Terminal-bench 2,5) | 43.2% / 50.0% | 35.5% / 41.3% | 35.2% | 30.2% | 30.3% | 25.3% | Graduate-level reasoning (GPQA Diamond 5) | 79.6% / 83.3% | 75.4% / 83.8% | 78.2% | 83.3% | 66.3% | 83.0% | Agentic tool use (TAU-bench, Retail/Airline) | 81.4% / 59.6% | 80.5% / 60.0% | 81.2% / 58.4% | 70.4% / 52.0% | 68.0% / 49.4% | — | Multilingual Q&A (MMMLU 3) | 88.8% | 86.5% | 85.9% | 88.8% | 83.7% | — | Visual reasoning (MMMU validation) | 76.5% | 74.4% | 75.0% | 82.9% | 74.8% | 79.6% | HS math competition (AIME 2025 4,5) | 75.5% / 90.0% | 70.5% / 85.0% | 54.8% | 88.9% | — | 83.0%

2 of 5

Renewed my Claude subscription to test these out. Looking forward to it

reddit.com › r/singularity › claude 4 benchmarks

r/singularity on Reddit: Claude 4 benchmarks

May 22, 2025 - The other SOTA models fairly consistently get 2 of them now, and I believe Sonnet 3.7 even got 1 of them, but 4.0 missed every edge case even running the prompt a few times. The code looks cleaner, but cleanness means a lot less than functional. Let's hope these benchmarks are representative though, and my prompt is just the edge case. ... Any improvement is good, but these benchmarks are not really impressive. I'll be waiting for the first review from API tho, Claude has a history of being very good at coding and I hope this will remain the case.

reddit.com › r/claudeai › claude 4: a step forward in agentic coding — hands-on developer report

r/ClaudeAI on Reddit: Claude 4: A Step Forward in Agentic Coding — Hands-On Developer Report

May 24, 2025 -

Anthropic recently unveiled Claude 4 (Opus and Sonnet), achieving record-breaking 72.7% performance on SWE-bench Verified and surpassing OpenAI’s latest models. Benchmarks aside, I wanted to see how Claude 4 holds up under real-world software engineering tasks. I spent the last 24 hours putting it through intensive testing with challenging refactoring scenarios.

I tested Claude 4 using a Rust codebase featuring complex, interconnected issues following a significant architectural refactor. These problems included asynchronous workflows, edge-case handling in parsers, and multi-module dependencies. Previous versions, such as Claude Sonnet 3.7, struggled here—often resorting to modifying test code rather than addressing the root architectural issues.

Claude 4 impressed me by resolving these problems correctly in just one attempt, never modifying tests or taking shortcuts. Both Opus and Sonnet variants demonstrated genuine comprehension of architectural logic, providing solutions that improved long-term code maintainability.

Key observations from practical testing:

Claude 4 consistently focused on the deeper architectural causes, not superficial fixes.
Both variants successfully fixed the problems on their first attempt, editing around 15 lines across multiple files, all relevant and correct.
Solutions were clear, maintainable, and reflected real software engineering discipline.

I was initially skeptical about Anthropic’s claims regarding their models' improved discipline and reduced tendency toward superficial fixes. However, based on this hands-on experience, Claude 4 genuinely delivers noticeable improvement over earlier models.

For developers seriously evaluating AI coding assistants—particularly for integration in more sophisticated workflows—Claude 4 seems to genuinely warrant attention.

A detailed write-up and deeper analysis are available here: Claude 4 First Impressions: Anthropic’s AI Coding Breakthrough

Interested to hear others' experiences with Claude 4, especially in similarly challenging development scenarios.

Top answer

1 of 5

Have you noticed any differences between Sonnet 4 and Opus 4 in terms of quality of work, instruction following or problem solving?

2 of 5

So when does one use Claude Opus versus Claude sonnet? In particular for programming/coding?

reddit.com › r/localllama › 😞no hate but claude-4 is disappointing

r/LocalLLaMA on Reddit: 😞No hate but claude-4 is disappointing

March 21, 2025 -

I mean how the heck literally Is Qwen-3 better than claude-4(the Claude who used to dog walk everyone). this is just disappointing 🫠

Top answer

1 of 5

220

Have you... used the model at all yourself? Done some real-world tasks with it? It seems a bit ridiculous to be "disappointed" over a single use-case benchmark that may or may not be representative of what you would do with the model.

2 of 5

121

Claude 4 Sonnet is the only model I've used in agent mode where's its process actually mirrors the flow of a developer. I'll give it a task, and it will: Read through the codebase. Find documentation related to what it's working on. Run terminal commands to read log files for errors/warnings Formulate a fix Rerun application Check logs again to verify the fix Write test cases Gemini just goes: "Oh, I see the problem! You had all this unnecessary code. I'll just rewrite the whole thing and remove all those pesky features and edge cases!" +300 -500 Done! Maybe use the model instead of being disappointed about benchmarks?

reddit.com › r/claudeai › claude 4 opus is the most tasteful coder among all the frontier models.

r/ClaudeAI on Reddit: Claude 4 Opus is the most tasteful coder among all the frontier models.

March 1, 2025 -

I have been extensively using Gemini 2.5 Pro for coding-related stuff and O3 for everything else, and it's crazy that within a month or so, they look kind of obsolete. Claude Opus 4 is the best overall model available right now.

I ran a quick coding test, Opus against Gemini 2.5 Pro and OpenAI o3. The intention was to create visually appealing and bug-free code.

Here are my observations

Claude Opus 4 leads in raw performance and prompt adherence.
It understands user intentions better, reminiscent of 3.6 Sonnet.
High taste. The generated outputs are tasteful. Retains the Opus 3 personality to an extent.
Though unrelated to code, the model feels nice, and I never enjoyed talking to Gemini and o3.
Gemini 2.5 is more affordable in pricing and takes much fewer API credits than Opus.
One million context length in Gemini is undefeatable for large codebase understanding.
Opus is the slowest in time to first token. You have to be patient with the thinking mode.

Check out the blog post for complete comparison analysis with codes: Claude 4 Opus vs. Gemini 2.5 vs. OpenAI o3

The vibes with Opus are the best; it has a personality and is stupidly capable. But too pricey; it's best used with the Claude app, the API cost will put a hole in your pocket. Gemini will always be your friend with free access and the cheapest SOTA model.

Would love to know your experience with Claude 4 Opus and how you would compare it with o3 and Gemini 2.5 pro in coding and non-coding tasks.

Top answer

1 of 5

Basically it’s the price, otherwise everyone would be using it.

2 of 5

I am paying 200 USD for the 20x max to be able to use Claude Code. I am usually running multiple agents at once so the 100USD plan was not enough and it was throtting me. Claude Opus 4 is so much more capable in Claude Code than in any other tool. I love it and its a true game changer.

reddit.com › r/localllama › claude 4 (sonnet) isn't great for document understanding tasks: some surprising results

r/LocalLLaMA on Reddit: Claude 4 (Sonnet) isn't great for document understanding tasks: some surprising results

May 23, 2025 -

Finished benchmarking Claude 4 (Sonnet) across a range of document understanding tasks, and the results are… not that good. It's currently ranked 7th overall on the leaderboard.

Key takeaways:

Weak performance in OCR – Claude 4 lags behind even smaller models like GPT-4.1-nano and InternVL3-38B-Instruct.
Rotation sensitivity – We tested OCR robustness with slightly rotated images ([-5°, +5°]). Most large models had a 2–3% drop in accuracy. Claude 4 dropped 9%.
Poor on handwritten documents – Scored only 51.64%, while Gemini 2.0 Flash got 71.24%. It also struggled with handwritten datasets in other tasks like key information extraction.
Chart VQA and visual tasks – Performed decently but still behind Gemini, Claude 3.7, and GPT-4.5/o4-mini.
Long document understanding – Claude 3.7 Sonnet (reasoning:low) ranked 1st. Claude 4 Sonnet ranked 13th.
One bright spot: table extraction – Claude 4 Sonnet is currently ranked 1st, narrowly ahead of Claude 3.7 Sonnet.

Leaderboard: https://idp-leaderboard.org/

Codebase: https://github.com/NanoNets/docext

How has everyone’s experience with the models been so far?

Top answer

1 of 5

I just want to thank you for contributing to model evals, an area that is currently in high need of more attention

2 of 5

So my takeaway is Anthropic is fully focused on coding and agent usage with Claude 4 and sucks at other things.

Find elsewhere

Google Bing Mojeek

reddit.com › r/claudeai › claude 4 benchmarks - we eating!

r/ClaudeAI on Reddit: Claude 4 Benchmarks - We eating!

March 2, 2025 -

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4.

Claude Opus 4 is our most powerful model yet, and the world’s best coding model.

Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.

Top answer

1 of 5

139

I would like to remind you: do not trust any benchmarks, test it yourself.

2 of 5

Context window is still 200k?

reddit.com › r/claudeai › claude 4 opus, is probably the best model for coding right now

r/ClaudeAI on Reddit: Claude 4 OPUS, is probably the best model for coding right now

April 23, 2025 -

I don't know what magic you guys did, but holy crap, Claude 4 opus is freaking amazing, beyond amazing! Anthropic team is legendary in my books for this. I was able to solve a very specific graph database chatbot issue that was plaguing me in production.

Rock on Claude team!

Top answer

1 of 5

It’s been one day bro

2 of 5

Sure if only I could use it for more than a few messages before getting locked out of every model. And I pay 20$ a month

reddit.com › r/claudeai › so claude 4 releasing soon ?

r/ClaudeAI on Reddit: So claude 4 releasing soon ?

March 6, 2025 - They have said they are not going straight to 4. ... Not sure the full source, this was some time ago on a podcast. But when searching I also saw this, which isn't the excat thing I saw https://www.reddit.com/r/ClaudeAI/comments/1j0xvfd/dario_amodei_we_are_reserving_claude_4_sonnetfor/

reddit.com › r/claudeai › claude 4 opus is actually insane for coding

r/ClaudeAI on Reddit: Claude 4 Opus is actually insane for coding

March 8, 2025 -

Been using ChatGPT Plus with o3 and Gemini 2.5 Pro for coding the past months. Both are decent but always felt like something was missing, you know? Like they'd get me 80% there but then I'd waste time fixing their weird quirks or explaining context over and over or running in a endless error loop.

Just tried Claude 4 Opus and... damn. This is what I expected AI coding to be like.

The difference is night and day:

Actually understands my existing codebase instead of giving generic solutions that don't fit
Debugging is scary good - it literally found a memory leak in my React app that I'd been hunting for days
Code quality is just... clean. Like actually readable, properly structured code
Explains trade-offs instead of just spitting out the first solution

Real example: Had this mess of nested async calls in my Express API. ChatGPT kept suggesting Promise.all which wasn't what I needed. Gemini gave me some overcomplicated rxjs nonsense. Claude 4 looked at it for 2 seconds and suggested a clean async/await pattern with proper error boundaries. Worked perfectly.

The context window is massive too - I can literally paste my entire project and it gets it. No more "remember we discussed X in our previous conversation" BS.

I'm not trying to shill here but if you're doing serious development work, this thing is worth every penny. Been more productive this week than the entire last month.

Got an invite link if anyone wants to try it: https://claude.ai/referral/6UGWfPA1pQ

Anyone else tried it yet? Curious how it compares for different languages/frameworks.

EDIT: Just to be clear - I've tested basically every major AI coding tool out there. This is the first one that actually feels like it gets programming, not just text completion that happens to be code. This also takes Cursor to a whole new level!

Top answer

1 of 5

517

Pretty sure I read this same post for Claude 3.7, and 3.5, and ...

2 of 5

123

"Just tried Claude 4 Opus" "Been more productive this *week* than the entire last month" Which is it?

reddit.com › r/claudeai › when is claude 4 coming?

When is Claude 4 coming? : r/ClaudeAI

June 17, 2024 - Based on some public statements from Claude, they seemed pretty confident about the 3.5-sonnet model, that's why they’ve been hesitant to release the 3.5-opus model. So, I believe we’ll not see anything from the Claude-4 lineup for at least another year.

reddit.com › r/localllama › is claude 4 worse than 3.7 for anyone else?

r/LocalLLaMA on Reddit: Is Claude 4 worse than 3.7 for anyone else?

May 23, 2025 -

I know, I know, whenever a model comes out you get people saying this, but it's on very concrete things for me, I'm not just biased against it. For reference, I'm comparing 4 Sonnet (concise) with 3.7 Sonnet (concise), no reasoning for either.

I asked it to calculate the total markup I paid at a gas station relative to the supermarket. I gave it quantities in a way I thought was clear ("I got three protein bars and three milks, one of the others each. What was the total markup I paid?", but that's later in the conversation after it searched for prices). And indeed, 3.7 understands this without any issue (and I regenerated the message to make sure it wasn't a fluke). But with 4, even with much back and forth and several regenerations, it kept interpreting this as 3 milk, 1 protein bar, 1 [other item], 1 [other item], until I very explicitly laid it out as I just did.

And then, another conversation, I ask it, "Does this seem correct, or too much?" with a photo of food, and macro estimates for the meal in a screenshot. Again, 3.7 understands this fine, as asking whether the figures seem to be an accurate estimate. Whereas 4, again with a couple regenerations to test, seems to think I'm asking whether it's an appropriate meal (as in, not too much food for dinner or whatever). And in one instance, misreads the screenshot (thinking that the number of calories I will have cumulatively eaten after that meal is the number of calories of that meal).

Is anyone else seeing any issues like this?

Top answer

1 of 46

Yeah, I'll get downvoted but I have both Opus and Sonnet 4 on API and the first thing I did was paste in a large (~90k token) block of code and ask a few questions about it. They BOTH hallucinated bugs that didn't exist at all in the code. As in, the code they were quoting did not even exist in the text. Then I re-ran it on Sonnet 3.7 and it went fine. Same with Gemini 2.5 Pro. Something is really screwed up with their caching or something.

2 of 46

I haven't done too much testing with claude 4, but I'll actually assume what is happening is that anthropic has gone all in with coding abilities, and has decreased from all other skills. (similar to the Gemini 2.5 pro models for the IO edition vs pre-io edition)

reddit.com › r/singularity › claude 4

r/singularity on Reddit: Claude 4

December 23, 2024 - I might like it a bit more than 2.5 pro (using sonnet 4.0 with 4096 thinking context length) ... Seeing the same! Got a lot of good work done today with it. ... 2nd mistake in ±15 prompts (can't remember exactly lol), and it was easily fixable, I feel like the code it provides is more human readable than 2.5 pro. It's surely better than sonnet 3.7 imho, i'm working on the same code base where 3.7 struggled. I gotta see why my claude code isn't updating tomorrow and take it for spin there too.

reddit.com › r/singularity › claude 4 in the coming weeks, here is what we know from the information

r/singularity on Reddit: Claude 4 in the coming weeks, here is what we know from The Information

September 19, 2024 - I'm saying that the constraints around lowcode app tools like replit/loveable/v0 etc almost all use Claude 3.5, but they put software constraints around the model to reduce the errors. Just switching the model to 4 will probably provide a bump, but they'll have to slowly over time expand the tooling and reduce the constraints around it, as this new model will probably be able to do much more.

reddit.com › r/localllama › claude 4 by anthropic officially released!

r/LocalLLaMA on Reddit: Claude 4 by Anthropic officially released!

October 25, 2024 - Tested on a handful (like 10) TeX and Python problems that un-nerfed 2.5 pro could solve with a bit of back and fourth, 4.0 (free) however failed most of them. Probably small skill increase when compared to 3.7 but not a huge gamechanger. 75$ out for that is a complete joke :) Way more excited for Devstral results this week... ... Got so used by Gemini's 1M context window (yek . |pbcopy), anything less = DOA to me. ... wow..... the limit here is crazy och claude 4!!