Since some people have been asking, here's the actual output limit for Sonnet 3.7 with and without thinking:
Non-thinking: 8192 tokens
Non-thinking chat: https://claude.ai/share/af0b52b3-efc3-452b-ad21-5e0f39676d9f
Thinking: 24196 tokens*
Thinking chat: https://claude.ai/share/c3c8cec3-2648-4ec4-a13d-c6cce7735a67
*The thinking tokens don't make a lot of sense to me, as I'd expect them to be 3 * 8192 = 24576, but close enough I guess. Also in the example the thinking tokens itself are 23575 before being cut off in the main response, so thinking alone may actually be longer.
Tokens have been calculated with the token counting API and subtracting 16 tokens (role and some other tokens that are always present).
Hope this helps and also thanks to the discord mod, that shall not be pinged, for the testing prompt.
Videos
I've been using Claude Pro for a while now, and I noticed something strange today. When using Sonnet 3.7, it seems like the token limit for individual responses is lower than before. Previously Claude could generate much longer single responses, but now it seems to cut off earlier.
Has anyone else experienced this? Did Anthropic reduce the response length limits for Claude Pro recently, or am I imagining things? I couldn't find any announcement about changes to the limits.
If you've noticed the same thing or have any information about this, I'd appreciate hearing about it!
Thanks!
Claude Sonnet 3.7 had output limits of 8k tokens in normal mode and 64k in Thinking Mode. However, I can't find official documentation about Claude Sonnet 4's output limits in normal mode, nor information about Claude Opus 4's limits.
Does anyone have this information or know where to find it?