Since some people have been asking, here's the actual output limit for Sonnet 3.7 with and without thinking:
Non-thinking: 8192 tokens
Non-thinking chat: https://claude.ai/share/af0b52b3-efc3-452b-ad21-5e0f39676d9f
Thinking: 24196 tokens*
Thinking chat: https://claude.ai/share/c3c8cec3-2648-4ec4-a13d-c6cce7735a67
*The thinking tokens don't make a lot of sense to me, as I'd expect them to be 3 * 8192 = 24576, but close enough I guess. Also in the example the thinking tokens itself are 23575 before being cut off in the main response, so thinking alone may actually be longer.
Tokens have been calculated with the token counting API and subtracting 16 tokens (role and some other tokens that are always present).
Hope this helps and also thanks to the discord mod, that shall not be pinged, for the testing prompt.
Videos
I've been using Claude Pro for a while now, and I noticed something strange today. When using Sonnet 3.7, it seems like the token limit for individual responses is lower than before. Previously Claude could generate much longer single responses, but now it seems to cut off earlier.
Has anyone else experienced this? Did Anthropic reduce the response length limits for Claude Pro recently, or am I imagining things? I couldn't find any announcement about changes to the limits.
If you've noticed the same thing or have any information about this, I'd appreciate hearing about it!
Thanks!
I've been experimenting with Claude 3.7 and I'm genuinely impressed with its capabilities. The quality of responses and reasoning is excellent, especially for coding tasks. However, as a free user, I'm finding it practically unusable due to the severe rate limits.
I can only get through about 1-2 coding prompts per day before hitting the limit. This makes it impossible to have any meaningful ongoing development session or troubleshooting conversation.
I would happily pay for a subscription if the context window was significantly larger. The current 8k token limit is simply too restrictive for serious work. For comparison, I regularly use Gemini 2.0 Pro which offers a 2 million token context window, allowing me to include entire codebases and documentation in my prompts. Look at grok and GPT-o3-mini, both models are comparable in terms of quality and i get many times the usage as a free user, grok 3 has 50 normal prompts a day and 10 thinking prompts a day, 03-mini gets unlimited 4o mini, tens of thousands of 4o tokens, and over a dozen 03 prompts, without paying a dime, all models having a much larger context window.
With just 8k tokens, I can barely fit a moderate-sized function and its related components before running out of space. Let along giving Claude frontend context. This means constantly having to reframe my questions and lose context, making complex programming tasks frustratingly inefficient.
Does anyone else feel the same way? I want to support Claude and would gladly pay for a better experience, but the current limitations make it hard to justify even for a paid tier.
Claude Sonnet 3.7 had output limits of 8k tokens in normal mode and 64k in Thinking Mode. However, I can't find official documentation about Claude Sonnet 4's output limits in normal mode, nor information about Claude Opus 4's limits.
Does anyone have this information or know where to find it?
That you can nuke 7k words and it'll translate that all at once is amazing. Also far less censored. Really nice model.