Nothing major. For me opus 4 was also great enough and did everything that I asked for. Now for something to be great, it has to do something which I did not ask for and that is something which I won't ask for. Answer from anantprsd5 on reddit.com
🌐
Bind AI IDE
blog.getbind.co › 2025 › 08 › 06 › claude-opus-4-1-vs-claude-opus-4-how-good-is-this-upgrade
Claude Opus 4.1 vs Claude Opus 4 – How good is this upgrade?
August 6, 2025 - Reliability: In practice, Opus 4.1 is more likely to both correctly identify and implement code changes needed to fix or enhance large, complex software projects. Scale: Handling multi-file edits, intricate dependencies, and significant codebase ...
🌐
Arsturn
arsturn.com › blog › is-claude-opus-4-1-worth-the-200-price-tag-a-deep-dive
Claude Opus 4.1 Review: Is It Worth the $200 Price?
The "Agent" Builder: If you're on the cutting edge, building AI agents that can perform complex, multi-step tasks autonomously, Opus 4.1 is a top contender. Its performance in agentic benchmarks & long-running tasks makes it one of the best platforms for this kind of work.
🌐
Data Studios
datastudios.org › post › claude-opus-4-1-reviews-what-experts-and-users-are-saying-about-anthropic-s-most-advanced-model
Claude Opus 4.1 reviews: what experts and users are saying about Anthropic’s most advanced model.
August 9, 2025 - Released as a direct response to the rapidly evolving competition from OpenAI and Google, Claude 4.1 consolidates its reputation with top-tier accuracy, tighter safety standards, and a more agentic approach to complex tasks.
🌐
Reddit
reddit.com › r/claudeai › genuinely impressed by opus 4.1
r/ClaudeAI on Reddit: Genuinely impressed by Opus 4.1
June 27, 2025 -

Been using Claude daily for development work and wanted to share some thoughts on the recent updates, especially after trying out Opus 4.1.

So I’ve been using Claude Code in strict mode for a while now, giving it precise instructions rather than just asking it to build entire features. This was working pretty well, but honestly I started feeling like Opus 4.0 was getting a bit worse over time, especially for planning work. Could’ve been in my head though.

When 4.1 dropped, I decided to actually test it on some complex stuff in a large codebase that I normally wouldn’t bother with. And damn… it actually crushed some really intricate problems. The solutions it came up with were genuinely impressive, not perfect, but as a senior engineer I was pretty surprised by the quality.

I keep seeing people complain about hitting limits too fast, but honestly I think it depends entirely on how you’re using it. If you dump a huge codebase on Opus and ask it to implement a whole feature, yeah, you’re gonna burn through your limits. But if you’re smart about it, it’s like having an amazing teammate.

I’m on the max plan (so maybe I’m biased here), but my current approach is to use Opus 4.1 for the high-level thinking - planning features, writing specs. Then I take those specs and hand them to Sonnet to actually implement. Sonnet just follows the plan and writes the code. Always review everything manually though, that’s still our job.

This way Opus handles the complex reasoning while Sonnet does the grunt work, and I’m not constantly hitting limits.

Honestly, when you use it right, Opus 4.1 feels like working with a really solid co-worker. Kudos to the Claude team - this update is legit! 👏

🌐
Anthropic
anthropic.com › claude › opus
Claude Opus 4.5
Claude Opus 4.1 is a drop-in replacement for Opus 4 that delivers superior performance and precision for real-world coding and agentic tasks.
🌐
Anthropic
anthropic.com › news › claude-opus-4-1
Claude Opus 4.1
Opus 4.1 advances our state-of-the-art coding performance to 74.5% on SWE-bench Verified. It also improves Claude’s in-depth research and data analysis skills, especially around detail tracking and agentic search.
🌐
Medium
medium.com › @leucopsis › claude-sonnet-4-and-opus-4-a-review-db68b004db90
Claude Sonnet 4 and Opus 4, a Review | by Barnacle Goose | Medium
May 29, 2025 - For instance, on the popular MMLU test (Massive Multitask Language Understanding, covering a wide range of subjects at college level), Claude Opus 4 scores around 87–89% accuracy — this is on par with or slightly above the original GPT-4 (which was ~86%) and just shy of OpenAI’s latest GPT-4.1 (which reportedly surpassed 90% on MMLU).
Find elsewhere
🌐
Glbgpt
glbgpt.com › resource › claude-opus-41-review-a-targeted-upgrade-for-coding-and-agentic-work
Claude Opus 4.1 review: a targeted upgrade for coding and agentic work
Anthropic says Opus 4.1 advances coding performance to 74.5% on SWE-bench Verified (their standard scaffold). Early customer quotes call out better multi-file refactors and fewer unnecessary edits. Agentic & terminal work. Multiple trackers report Terminal-Bench at ~43.3% (up from 39.2%), ...
🌐
Reddit
reddit.com › r/claudeai › claude 4 opus is the most tasteful coder among all the frontier models.
r/ClaudeAI on Reddit: Claude 4 Opus is the most tasteful coder among all the frontier models.
February 18, 2025 -

I have been extensively using Gemini 2.5 Pro for coding-related stuff and O3 for everything else, and it's crazy that within a month or so, they look kind of obsolete. Claude Opus 4 is the best overall model available right now.

I ran a quick coding test, Opus against Gemini 2.5 Pro and OpenAI o3. The intention was to create visually appealing and bug-free code.

Here are my observations

  • Claude Opus 4 leads in raw performance and prompt adherence.

  • It understands user intentions better, reminiscent of 3.6 Sonnet.

  • High taste. The generated outputs are tasteful. Retains the Opus 3 personality to an extent.

  • Though unrelated to code, the model feels nice, and I never enjoyed talking to Gemini and o3.

  • Gemini 2.5 is more affordable in pricing and takes much fewer API credits than Opus.

  • One million context length in Gemini is undefeatable for large codebase understanding.

  • Opus is the slowest in time to first token. You have to be patient with the thinking mode.

Check out the blog post for complete comparison analysis with codes: Claude 4 Opus vs. Gemini 2.5 vs. OpenAI o3

The vibes with Opus are the best; it has a personality and is stupidly capable. But too pricey; it's best used with the Claude app, the API cost will put a hole in your pocket. Gemini will always be your friend with free access and the cheapest SOTA model.

Would love to know your experience with Claude 4 Opus and how you would compare it with o3 and Gemini 2.5 pro in coding and non-coding tasks.

🌐
Medium
medium.com › @cognidownunder › anthropic-claude-opus-4-1-the-definitive-guide-to-anthropics-most-advanced-ai-model-yet-bf1c6f0de736
Anthropic Claude Opus 4.1: The Definitive Guide to Anthropic’s Most Advanced AI Model Yet | by Cogni Down Under | Medium
August 6, 2025 - Let’s cut through the marketing speak. Claude Opus 4.1 hits 74.5% on SWE-bench Verified, up from 72.5% in Opus 4. That’s a 2-percentage-point improvement in the model’s ability to fix real-world software bugs.
🌐
Reddit
reddit.com › r/claudeai › meet claude opus 4.1
r/ClaudeAI on Reddit: Meet Claude Opus 4.1
August 5, 2025 -

Today we're releasing Claude Opus 4.1, an upgrade to Claude Opus 4 on agentic tasks, real-world coding, and reasoning.

We plan to release substantially larger improvements to our models in the coming weeks.

Opus 4.1 is now available to paid Claude users and in Claude Code. It's also on our API, Amazon Bedrock, and Google Cloud's Vertex AI.

https://www.anthropic.com/news/claude-opus-4-1

🌐
Reddit
reddit.com › r/claudeai › claude opus and sonnet 4 vs gpt4.1 - first hand experience as a professional firmware engineer experimenting with vibe.
r/ClaudeAI on Reddit: Claude opus and sonnet 4 vs gpt4.1 - first hand experience as a professional firmware engineer experimenting with vibe.
May 31, 2025 -

So to preface this, I've been writing software and firmware for over a decade, my profession is specifically in reverse engineering, problem solving, pushing limits and hacking.

So far with using the following Gpt 4.1 Gpt o4 Claude S 4 (gets distracted by irrelevant signals like incorrect comments in code, assumptions etc) Gemini 2.5 (not great at intuiting holes in task) Claude O 4 ( i have been forced to use the same prompt with other ai because of how poorly it performs)

I would say this is the order of overall success in usage. All of them improve my work experience, they turn the work id give a jr or inturn, or grind work where its simple concept but laborious implementation into minutes or seconds for acceptable implementation.

Now they all have usual issues but opus unfortunately has been particularly bad at breaking things, getting distracted, hallucinating, coming to quick incorrect conclusions, getting stuck in really long Stupid loops, not following my instructions and generally forcing me to reattempt the same task with a different ai.

They all are guilty of changing things that I didn't ask for whilst performing other tasks. They all can daily to understand intent without very specific non ambiguous instructions.

Gpt 4.1 simply outshines the rest in overall performance in coding It spots complex errors, intuits meaning not just going by the letter. It's QUICK like really quick compared to the others. It doesn't piss me off ( I've never felt the need to use expletives until Claude 4 )

🌐
Hacker News
news.ycombinator.com › item
Claude Opus 4.1 | Hacker News
August 8, 2025 - If you look at the past, whenever Google announces something major, OpenAI almost always releases something as well · People forget realize that OpenAI was started to compete with Google on AI
🌐
Reddit
reddit.com › r/singularity › claude opus 4.1 benchmarks
r/singularity on Reddit: Claude Opus 4.1 Benchmarks
June 21, 2025 - Windsurf reports Opus 4.1 delivers a one standard deviation improvement over Opus 4 on their junior developer benchmark, showing roughly the same performance leap as the jump from Sonnet 3.7 to Sonnet 4. My hope is that they're releasing this ...
🌐
9to5Mac
9to5mac.com › 2025 › 08 › 05 › anthropic-claude-opus-4-1
Anthropic rolls out Claude Opus 4.1 with improved software engineering accuracy - 9to5Mac
August 5, 2025 - Anthropic says Claude Opus 4.1 improves software engineering accuracy to 74.5%. That compares to 62.3% with Claude Sonnet 3.7 and 72.5% with Claude Opus 4.
🌐
Laozhang
blog.laozhang.ai › api-services › claude-opus-pricing-2025
Claude 4.1 Opus Pricing Guide 2025: Complete Cost Analysis & Comparison – LaoZhang-AI
The standout achievement of Claude Opus 4.1 lies in its unprecedented 74.5% score on SWE-bench Verified, establishing it as the industry leader for coding tasks and surpassing competitors like GPT-4.1’s 69.1% and Gemini 2.5 Pro’s 63.2%. This performance translates directly into practical ...
🌐
Reddit
reddit.com › r/claudecode › is claude code sonnet 4.5 really better than opus 4.1? not seeing it.
r/ClaudeCode on Reddit: Is Claude Code Sonnet 4.5 Really Better Than Opus 4.1? Not Seeing It.
October 3, 2025 -

How are people genuinely praising Claude Code Sonnet 4.5? I have no idea what’s happening…but from my experience it’s pretty disappointing. Sorry if that stings, but I’m honestly curious about what others see in it.

I’m speaking as someone who uses Claude Code daily easily 7+ hours per day and who has been deeply involved with it since the beginning. I consider myself a power user and truly understand the capabilities it should have. Maybe I’m missing something crucial here…but BESIDES that point I’m really dissatisfied and frustrated with Anthropic right now.

On top of that, the marketing hype around Sonnet 4.5 feels like the same garbage AI slot promotion we saw everywhere with ChatGPT lol. It’s being marketed as the “best model in the world,” likely to people who barely even scratch its surface.

I’ve also just hit a usage limit on Opus 4.1. I’m on the max 200 plan and now there’s some kind of cap in place…for what, a week? Why? If Sonnet is sooooo good why are they placing weekly limits on opus 4.1? So stupid. Can someone explain what’s going on here?

🌐
Anthropic
anthropic.com › news › claude-opus-4-5
Introducing Claude Opus 4.5
However, what if we: - Change the ... be: 1. Upgrade his cabin from basic economy to economy (or business) 2. Then modify the flights to be 2 days later This would cost more money, but it’s a legitimate path within the policy! ... The benchmark technically scored this as a failure because Claude’s way of helping the customer was unanticipated. But this kind of creative problem solving is exactly what we’ve heard about from our testers and customers—it’s what makes Claude Opus 4.5 feel like ...
🌐
Reddit
reddit.com › r/claudeai › 4.1 opus isn't perfect but the difference is enormous.
r/ClaudeAI on Reddit: 4.1 Opus isn't perfect but the difference is enormous.
August 12, 2025 -

I previously had the $100 Claude 4 but went back to $20. Today, I decided to try out 4.1 Opus. Unbelievable really.

I had previously attempted this enormous shitshow of a refactor from React Context to Zunstand over 40k lines of code and everything always failed miserably. I'm a 2.5 fanboy but it doesn't have that capability.

Hit the limits of the $100 plan pretty fast so went to $200 and it's been a breeze. Really logical code changes and great testing along the way. It all makes sense for this huge reactor that I will spend the next few weeks working on.

Yeah, I'm a believer. I have bitched about Claude plenty but this just feels smart as hell.

For context, I am trying to maintain my current application's behaviour while switching to Zustand and react query. Nothing new yet, just wildly complex tech debt to navigate out of.

(10+ years programming and had a semi-successful saas before with all the business meetings etc. that goes along with that. Not a newbie.)