Videos
Source: Code with Claude Opening Keynote
After trying several Free version of several assistants (GitHub Copilot, ChatGPT, etc.), Claude Sonnet 4 Thinking 🙌 stands out for me as the best coding assistant so far. A few things that sold me:
Reasoning-first answers — it walks through why an approach works (or doesn’t), not just pastes code.
Multi-file context — it keeps track of project structure and gives consistent suggestions across files.
Refactor & tests — it suggests concise refactors and generates unit tests that actually catch edge-cases.
Debugging help — when I paste stack traces or failing tests it narrows the root cause quickly and suggests minimal fixes.
Readable style — produced code is readable and easy to adopt; less hand-holding required.
Not perfect — token limits and cost can be a factor for very large projects, and sometimes you still need to vet outputs. But for me the time saved + improved code quality outweighs those. Curious what others use for deep debugging or multi-file refactors.
Anyone else prefer Claude for coding? Why/why not?
Do you like this personally?
Starting off: Don't get me wrong, Sonnet 4 is legendary model for coding. It's so good, maybe even too good. It has zero-shot basically every one of my personal tests in Cursor and a couple complex Rust problems I always test LLMs with.
I belive most people have hugely praised Sonnet 4 with good reasons. It's extremely good at coding, yet due to the fact that lots of people in this sub are coders, they often feel they're whole day gets more productive. What they don't realize is that this model is kinda bad for normies. This model on a personal note has felt severely overtrained on code and likely caused catastrophic forgetting in this model. It feels severely lobotimized on non-code related tasks.
Opus 4 however seems to be fine, it has gone through my math tasks without and issues. Just too expensive to be a daily driver tho.
Here is one of the grade 9 math problem from math class I recently had to do (yes im in high school). I decided to try Sonnet 4 on it.
Math ProblemI gave Sonnet 4 (non-reasoning) this exact prompt of "Teach me how to do this question step-by-step for High School Maths" and GPT-4.1 the same prompt with this image attached.
Results:
Sonnet 4Sonnet 4 got completely confused and starts just doing confusing random operations and gets lost. Then gives me some vague steps and tries to get me to solve it???? Sonnet 4 very rarely gets it right, it either starts trying to make the user solve it or gives out answers like (3.10, 3.30, 3.40 and etc).
GPT-4.1 Response:
GPT-4.1 ResponseI have reran the same test on GPT 4.1 also many times and it seems to get it right every single time. This is one of the of dozens of questions I have found Sonnet 4 getting consistenly wrong or just rambles about. Whereas GPT-4.1 hits it right away.
People in AI all believes these models are improving so much (they are) but normies don't experience that much. As I believe the most substantial improvements on these models recently were code. whereas normies don't code, they can tell it improved a bit, but not a mind blowing amount.
I have a question regarding the thinking vs non-thinking versions for Sonnet 4. I've been using the 3.7 non-thinking version because it shows less over-eagerness and better rule-following. With Claude 4, since its over-eagerness is now tamed, does it make sense to use the thinking mode?
What is the thinking model supposed to help with? I'm using it with Cline and need clarity on whether thinking mode is worth it. If so, how many tokens should be allocated to thinking mode for optimal results?
I'm not too concerned with costs but prioritize better rule-following and problem-solving.