Really.. chatgpt has output limit Of muchhhh more than 4k. And all versions of claude have still 4k,and i know about telling Continue but it makes the message limit muchhh shorter. Please increase it to much more.
Videos
Here is the transcribed conversation from claude.AI: https://pastebin.com/722g7ubz
Here is a screenshot of the last response: https://imgur.com/a/kBZjROt
As you can see, it is cut off as being "over the maximum length".
I replicated the same conversation in the API workbench (including the system prompt), with 2048 max output tokens and 4096 max output tokens respectively.
Here are the responses.
2048 max output length: https://pastebin.com/3x9HWHnu
4096 max output length: https://pastebin.com/E8n8F8ga
Since claude's tokenizer isn't public, I'm relying on OAI's, but it's irrelevant whether they're perfectly accurate counts or not - I'm comparing between the responses. You can get an estimation of the claude token count by adding 20%.
Note: I am comparing just the code blocks, since they make up the VAST majority of the length.
Web UI response: 1626 OAI tokens = around 1950 claude tokens
API response (2048): 1659 OAI tokens = around 1990 claude tokens
API response (4096): 3263 OAI tokens = around 3910 claude tokens
I would call this irrefutable evidence that the webUI is limited to 2048 output tokens, now (1600 OAI tokens is likely roughly 2000 claude 3 tokens).
I have been sent (and have found on my account) examples of old responses that were obviously 4096 tokens in length, meaning this is a new change.
I have seen reports of people being able to get responses over 2048 tokens, which makes me think this is A/B testing.
This means that, if you're working with a long block of code, your cap is effectively HALVED, as you need to ask claude to continue twice as often.
This is absolutely unacceptable. I would understand if this was a limit imposed on free users, but I have Claude Pro.
EDIT: I am almost certain this is an A/B test, now. u/Incenerer posted a comment down below with instructions on how to check which "testing buckets" you're in.
https://www.reddit.com/r/ClaudeAI/comments/1f4xi6d/the_maximum_output_length_on_claudeai_pro_has/lkoz6y3/
So far, both I and another person that's limited to 2048 output tokens have this gate set as true:
{
"gate": "segment:pro_token_offenders_2024-08-26_part_2_of_3",
"gateValue": "true",
"ruleID": "id_list"
}Please test this yourself and report back!
EDIT2: They've since hashed/encrypted the name of the bucket. Look for this instead:
{
"gate": "segment:inas9yh4296j1g41",
"gateValue": "false",
"ruleID": "default"
}EDIT3: The gates and limit are now gone: https://www.reddit.com/r/ClaudeAI/comments/1f5rwd3/the_halved_output_length_gate_name_has_been/lkysj3d/
This is a good step forward, but doesn't address the main question - why were they implemented in the first place. I think we should still demand an answer. Because it just feels like they're only sorry they got caught.
Since some people have been asking, here's the actual output limit for Sonnet 3.7 with and without thinking:
Non-thinking: 8192 tokens
Non-thinking chat: https://claude.ai/share/af0b52b3-efc3-452b-ad21-5e0f39676d9f
Thinking: 24196 tokens*
Thinking chat: https://claude.ai/share/c3c8cec3-2648-4ec4-a13d-c6cce7735a67
*The thinking tokens don't make a lot of sense to me, as I'd expect them to be 3 * 8192 = 24576, but close enough I guess. Also in the example the thinking tokens itself are 23575 before being cut off in the main response, so thinking alone may actually be longer.
Tokens have been calculated with the token counting API and subtracting 16 tokens (role and some other tokens that are always present).
Hope this helps and also thanks to the discord mod, that shall not be pinged, for the testing prompt.