🌐
Claude
docs.claude.com › en › api › rate-limits
Rate limits - Claude Docs
In order to protect Workspaces in your Organization from potential overuse, you can set custom spend and rate limits per Workspace. Example: If your Organization’s limit is 40,000 input tokens per minute and 8,000 output tokens per minute, you ...
🌐
Reddit
reddit.com › r/claudeai › claude output limit...
r/ClaudeAI on Reddit: Claude Output limit...
June 10, 2024 -

Really.. chatgpt has output limit Of muchhhh more than 4k. And all versions of claude have still 4k,and i know about telling Continue but it makes the message limit muchhh shorter. Please increase it to much more.

🌐
Substack
simonw.substack.com › p › claude-37-sonnet-extended-thinking
Claude 3.7 Sonnet, extended thinking and long output
February 25, 2025 - (This is the output limit - how much text it can produce in one go. Claude 3.7 Sonnet's input limit remains 200,000 - many modern models exceed 100,000 for input now.)
🌐
ClaudeLog
claudelog.com › home › faqs › what is the limit of claude ai
What is the Limit of Claude AI | ClaudeLog
Claude Max: Expanded limits at 5x or 20x Pro usage caps · API access: Rate limits include requests per minute, input/output tokens per minute, and monthly spending limits by usage tier
🌐
Medium
watchsound.medium.com › how-to-bypass-claude-ais-output-limits-to-build-complex-projects-85146e015445
How to bypass Claude.ai’s output limits to build complex projects. | by Hanning Ni | Medium
August 10, 2025 - How to bypass Claude.ai’s output limits to build complex projects. Claude, like most large language models, has a token limit for its responses, meaning it can only output so much text at once …
🌐
Simon Willison
simonwillison.net › 2025 › Feb › 25 › llm-anthropic-014
Claude 3.7 Sonnet, extended thinking and long output, llm-anthropic 0.14
February 25, 2025 - (This is the output limit—how much text it can produce in one go. Claude 3.7 Sonnet’s input limit remains 200,000—many modern models exceed 100,000 for input now.)
🌐
Claude Docs
platform.claude.com › docs › en › api › rate-limits
Rate limits - Claude Docs
Example: If your Organization's limit is 40,000 input tokens per minute and 8,000 output tokens per minute, you might limit one Workspace to 30,000 total tokens per minute. This protects other Workspaces from potential overuse and ensures a ...
🌐
Northflank
northflank.com › blog › claude-rate-limits-claude-code-pricing-cost
Claude Code: Rate limits, pricing, and alternatives | Blog — Northflank
Claude API implements several types of rate limits: Requests per minute (RPM) - Limits the number of API calls within a 60-second window · Tokens per minute (TPM) - Caps the total tokens (both input and output) processed within a minute
Find elsewhere
🌐
LobeHub
lobehub.com › blog › complete-guide-to-claude-ai-usage-limits
Complete Guide to Claude AI Usage limits: Why, How to Resolve, and Advanced Usage · LobeHub
December 10, 2024 - Free users can send about 100 messages daily, and this usage quota resets automatically at midnight. Pro users have a much more generous limit, approximately five times that of free users.
🌐
Reddit
reddit.com › r/claudeai › the maximum output length on claude.ai (pro) has been halved (possibly an a/b test)
r/ClaudeAI on Reddit: The maximum output length on Claude.ai (Pro) has been halved (Possibly an A/B test)
September 1, 2024 -

Here is the transcribed conversation from claude.AI: https://pastebin.com/722g7ubz

Here is a screenshot of the last response: https://imgur.com/a/kBZjROt

As you can see, it is cut off as being "over the maximum length".

I replicated the same conversation in the API workbench (including the system prompt), with 2048 max output tokens and 4096 max output tokens respectively.

Here are the responses.

  • 2048 max output length: https://pastebin.com/3x9HWHnu

  • 4096 max output length: https://pastebin.com/E8n8F8ga

Since claude's tokenizer isn't public, I'm relying on OAI's, but it's irrelevant whether they're perfectly accurate counts or not - I'm comparing between the responses. You can get an estimation of the claude token count by adding 20%.

Note: I am comparing just the code blocks, since they make up the VAST majority of the length.

  • Web UI response: 1626 OAI tokens = around 1950 claude tokens

  • API response (2048): 1659 OAI tokens = around 1990 claude tokens

  • API response (4096): 3263 OAI tokens = around 3910 claude tokens

I would call this irrefutable evidence that the webUI is limited to 2048 output tokens, now (1600 OAI tokens is likely roughly 2000 claude 3 tokens).

I have been sent (and have found on my account) examples of old responses that were obviously 4096 tokens in length, meaning this is a new change.

I have seen reports of people being able to get responses over 2048 tokens, which makes me think this is A/B testing.

This means that, if you're working with a long block of code, your cap is effectively HALVED, as you need to ask claude to continue twice as often.

This is absolutely unacceptable. I would understand if this was a limit imposed on free users, but I have Claude Pro.

EDIT: I am almost certain this is an A/B test, now. u/Incenerer posted a comment down below with instructions on how to check which "testing buckets" you're in.

https://www.reddit.com/r/ClaudeAI/comments/1f4xi6d/the_maximum_output_length_on_claudeai_pro_has/lkoz6y3/

So far, both I and another person that's limited to 2048 output tokens have this gate set as true:

{
    "gate": "segment:pro_token_offenders_2024-08-26_part_2_of_3",
    "gateValue": "true",
    "ruleID": "id_list"
}

Please test this yourself and report back!

EDIT2: They've since hashed/encrypted the name of the bucket. Look for this instead:

{
	"gate": "segment:inas9yh4296j1g41",
	"gateValue": "false",
	"ruleID": "default"
}

EDIT3: The gates and limit are now gone: https://www.reddit.com/r/ClaudeAI/comments/1f5rwd3/the_halved_output_length_gate_name_has_been/lkysj3d/

This is a good step forward, but doesn't address the main question - why were they implemented in the first place. I think we should still demand an answer. Because it just feels like they're only sorry they got caught.

🌐
16x Prompt
prompt.16x.engineer › blog › claude-sonnet-gpt4-context-window-token-limit
Claude 3.5 Sonnet vs GPT-4o: Context Window and Token Limit | 16x Prompt
For output token limits, Claude 3.5 Sonnet has a maximum output of 4,096 tokens. This means the model can generate responses up to this token limit in one interaction.
🌐
Begins with AI
beginswithai.com › handling-large-ai-generated-apps-claudes-limitations-and-solutions
Handling Large AI-Generated Apps: Claude's Limitations and Solutions - Begins w/ AI
October 12, 2024 - Token Limitations: As previously mentioned, the maximum output length is often capped at around 2048 tokens for individual requests. Users may need to break down their requests into smaller parts. Contextual Overload: When Claude re-reads previous messages in a conversation to maintain context, ...
🌐
Arsturn
arsturn.com › blog › mastering-claudes-token-limits-a-beginners-guide
Mastering Claude's Token Limits: A Beginner's Guide
It can have a larger maximum output token limit (sometimes up to 16K) compared to Claude 3.5 Sonnet's standard 4K.
🌐
TypingMind
blog.typingmind.com › home › claude rate exceeded: guide to fix and prevent the error
Claude Rate Exceeded: Guide to Fix and Prevent the Error
October 9, 2025 - Claude enforces rate limits to ensure fair usage and maintain service stability. These limits include: Requests per minute (RPM) — how many API calls you can make in a minute · Input tokens per minute (ITPM) — how many input tokens (text you send) you can use per minute · Output tokens per minute (OTPM) — how many tokens can be generated in responses per minute
🌐
GitHub
github.com › simonw › llm-claude-3 › issues › 11
Support for long output on `claude-3.5-sonnet` · Issue #11 · simonw/llm-claude-3
August 30, 2024 - Pass extra_headers= for this. We've doubled the max output token limit for Claude 3.5 Sonnet from 4096 to 8192 in the Anthropic API. Just add the header "anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15" to your API calls https://simon...
Published   Aug 30, 2024
🌐
Reddit
reddit.com › r/claudeai › claude 3.7 output limit in ui
r/ClaudeAI on Reddit: Claude 3.7 output limit in UI
March 3, 2025 -

Since some people have been asking, here's the actual output limit for Sonnet 3.7 with and without thinking:
Non-thinking: 8192 tokens
Non-thinking chat: https://claude.ai/share/af0b52b3-efc3-452b-ad21-5e0f39676d9f

Thinking: 24196 tokens*
Thinking chat: https://claude.ai/share/c3c8cec3-2648-4ec4-a13d-c6cce7735a67

*The thinking tokens don't make a lot of sense to me, as I'd expect them to be 3 * 8192 = 24576, but close enough I guess. Also in the example the thinking tokens itself are 23575 before being cut off in the main response, so thinking alone may actually be longer.

Tokens have been calculated with the token counting API and subtracting 16 tokens (role and some other tokens that are always present).

Hope this helps and also thanks to the discord mod, that shall not be pinged, for the testing prompt.

🌐
Claude
support.claude.com › en › articles › 8243635-our-approach-to-rate-limits-for-the-claude-api
Our approach to rate limits for the Claude API | Claude Help Center
Our approach to rate limits for the Claude API · Updated this week · Your rate limit depends on your usage tier, and is currently measured in three key metrics: Requests per minute (RPM) Input tokens per minute (ITPM) Output tokens per minute (OTPM) If you exceed any of these rate limits, you will get a 429 error describing which rate limit was exceeded, along with a retry-after header indicating how long to wait.
🌐
X
x.com › OpenRouterAI › status › 1812972322887794923
OpenRouter on X: "The new 8k output limit for Claude is enabled by default for OpenRouter 🚀" / X
Good news for @AnthropicAI devs: We've doubled the max output token limit for Claude 3.5 Sonnet from 4096 to 8192 in the Anthropic API.
🌐
Claude
support.claude.com › en › articles › 11014257-about-claude-s-max-plan-usage
About Claude's Max Plan Usage | Claude Help Center
If your conversations are relatively short and use a less compute-intensive model, with the Max plan at 5x more usage, you can expect to send at least 225 messages every five hours, and with the Max plan at 20x more usage, at least 900 messages ...