I'm using the claude-3-5-sonnet-20240620 API and I need to set the max_tokens. The documentation clearly says the max output for this model should be 8,192 tokens, but I get an error saying the max is 4,096, like for the older models. Am I missing something, or did Anthropric fuck up the validation for the API ?
Videos
I've been using Claude Pro for a while now, and I noticed something strange today. When using Sonnet 3.7, it seems like the token limit for individual responses is lower than before. Previously Claude could generate much longer single responses, but now it seems to cut off earlier.
Has anyone else experienced this? Did Anthropic reduce the response length limits for Claude Pro recently, or am I imagining things? I couldn't find any announcement about changes to the limits.
If you've noticed the same thing or have any information about this, I'd appreciate hearing about it!
Thanks!
Claude.ai interface
Personal preferences = ONLY MY NAME
Capabilities = ALL OFF
Connectors = ALL OFF
Observations:
Ask Claude to conduct search that minimizes tokens to enable system's token usage vs budget.
Claude responds by searching for "a" with 10 websites and then observes ~15K-16K tokens used out of 190K (costs ~3K tokens to do the search)
I continue to provide large txt files at each turn and ask Claude to do the minimal search and to provide the system's token count. Each search costs ~3K.
I repeat these steps and never hit 190K. My last test hit the maximum length for this conversation at only ~150K. Other tests were as low as 110K-126K.
I did this testing because I was finding Sonnet 4.5 was hitting the maximum conversation length way earlier than Opus 4.1 in my experience.
I'm a heavy user. Max 20x Pro Plan $200/mo.
Something is WRONG!! Either the counter is wrong OR the maximum window is effectively not working. But my sense is that the maximum window is off because I use this product enough to know that I'm getting less "mileage" during a conversation with Sonnet 4.5.
Anyone else sensing Sonnet 4.5's conversation limit is being reached faster than usual?
Since some people have been asking, here's the actual output limit for Sonnet 3.7 with and without thinking:
Non-thinking: 8192 tokens
Non-thinking chat: https://claude.ai/share/af0b52b3-efc3-452b-ad21-5e0f39676d9f
Thinking: 24196 tokens*
Thinking chat: https://claude.ai/share/c3c8cec3-2648-4ec4-a13d-c6cce7735a67
*The thinking tokens don't make a lot of sense to me, as I'd expect them to be 3 * 8192 = 24576, but close enough I guess. Also in the example the thinking tokens itself are 23575 before being cut off in the main response, so thinking alone may actually be longer.
Tokens have been calculated with the token counting API and subtracting 16 tokens (role and some other tokens that are always present).
Hope this helps and also thanks to the discord mod, that shall not be pinged, for the testing prompt.