Brave Search

In order to protect Workspaces in your Organization from potential overuse, you can set custom spend and rate limits per Workspace. Example: If your Organization’s limit is 40,000 input tokens per minute and 8,000 output tokens per minute, you ...

reddit.com › r/claudeai › claude output limit...

r/ClaudeAI on Reddit: Claude Output limit...

June 10, 2024 -

Really.. chatgpt has output limit Of muchhhh more than 4k. And all versions of claude have still 4k,and i know about telling Continue but it makes the message limit muchhh shorter. Please increase it to much more.

Videos

04:14

YouTube

Use Claude WITHOUT Any Limits - In 5 Minutes

January 7, 2025

24:23

YouTube

I'm ADDICTED to Claude Code: RATE LIMITS, Agent Models, and CC ...

August 4, 2025

01:12:46

YouTube

New Claude Limits: Are You in the 5%? (Breaking Down What This ...

July 28, 2025

View all

Claude

support.claude.com › en › articles › 7996856-what-is-the-maximum-prompt-length

What is the maximum prompt length? | Claude Help Center

The maximum length of prompt that Claude can process is its context window.

Substack

simonw.substack.com › p › claude-37-sonnet-extended-thinking

Claude 3.7 Sonnet, extended thinking and long output

February 25, 2025 - (This is the output limit - how much text it can produce in one go. Claude 3.7 Sonnet's input limit remains 200,000 - many modern models exceed 100,000 for input now.)

ClaudeLog

claudelog.com › home › faqs › what is the limit of claude ai

What is the Limit of Claude AI | ClaudeLog

Claude Max: Expanded limits at 5x or 20x Pro usage caps · API access: Rate limits include requests per minute, input/output tokens per minute, and monthly spending limits by usage tier

Medium

watchsound.medium.com › how-to-bypass-claude-ais-output-limits-to-build-complex-projects-85146e015445

How to bypass Claude.ai’s output limits to build complex projects. | by Hanning Ni | Medium

August 10, 2025 - How to bypass Claude.ai’s output limits to build complex projects. Claude, like most large language models, has a token limit for its responses, meaning it can only output so much text at once …

Simon Willison

simonwillison.net › 2025 › Feb › 25 › llm-anthropic-014

Claude 3.7 Sonnet, extended thinking and long output, llm-anthropic 0.14

February 25, 2025 - (This is the output limit—how much text it can produce in one go. Claude 3.7 Sonnet’s input limit remains 200,000—many modern models exceed 100,000 for input now.)

Claude Docs

platform.claude.com › docs › en › api › rate-limits

Rate limits - Claude Docs

Example: If your Organization's limit is 40,000 input tokens per minute and 8,000 output tokens per minute, you might limit one Workspace to 30,000 total tokens per minute. This protects other Workspaces from potential overuse and ensures a ...

Northflank

northflank.com › blog › claude-rate-limits-claude-code-pricing-cost

Claude Code: Rate limits, pricing, and alternatives | Blog — Northflank

Claude API implements several types of rate limits: Requests per minute (RPM) - Limits the number of API calls within a 60-second window · Tokens per minute (TPM) - Caps the total tokens (both input and output) processed within a minute

Find elsewhere

Google Bing Mojeek

LobeHub

lobehub.com › blog › complete-guide-to-claude-ai-usage-limits

Complete Guide to Claude AI Usage limits: Why, How to Resolve, and Advanced Usage · LobeHub

December 10, 2024 - Free users can send about 100 messages daily, and this usage quota resets automatically at midnight. Pro users have a much more generous limit, approximately five times that of free users.

reddit.com › r/claudeai › the maximum output length on claude.ai (pro) has been halved (possibly an a/b test)

r/ClaudeAI on Reddit: The maximum output length on Claude.ai (Pro) has been halved (Possibly an A/B test)

September 1, 2024 -

Here is the transcribed conversation from claude.AI: https://pastebin.com/722g7ubz

Here is a screenshot of the last response: https://imgur.com/a/kBZjROt

As you can see, it is cut off as being "over the maximum length".

I replicated the same conversation in the API workbench (including the system prompt), with 2048 max output tokens and 4096 max output tokens respectively.

Here are the responses.

2048 max output length: https://pastebin.com/3x9HWHnu
4096 max output length: https://pastebin.com/E8n8F8ga

Since claude's tokenizer isn't public, I'm relying on OAI's, but it's irrelevant whether they're perfectly accurate counts or not - I'm comparing between the responses. You can get an estimation of the claude token count by adding 20%.

Note: I am comparing just the code blocks, since they make up the VAST majority of the length.

Web UI response: 1626 OAI tokens = around 1950 claude tokens
API response (2048): 1659 OAI tokens = around 1990 claude tokens
API response (4096): 3263 OAI tokens = around 3910 claude tokens

I would call this irrefutable evidence that the webUI is limited to 2048 output tokens, now (1600 OAI tokens is likely roughly 2000 claude 3 tokens).

I have been sent (and have found on my account) examples of old responses that were obviously 4096 tokens in length, meaning this is a new change.

I have seen reports of people being able to get responses over 2048 tokens, which makes me think this is A/B testing.

This means that, if you're working with a long block of code, your cap is effectively HALVED, as you need to ask claude to continue twice as often.

This is absolutely unacceptable. I would understand if this was a limit imposed on free users, but I have Claude Pro.

EDIT: I am almost certain this is an A/B test, now. u/Incenerer posted a comment down below with instructions on how to check which "testing buckets" you're in.

https://www.reddit.com/r/ClaudeAI/comments/1f4xi6d/the_maximum_output_length_on_claudeai_pro_has/lkoz6y3/

So far, both I and another person that's limited to 2048 output tokens have this gate set as true:

{
    "gate": "segment:pro_token_offenders_2024-08-26_part_2_of_3",
    "gateValue": "true",
    "ruleID": "id_list"
}

Please test this yourself and report back!

EDIT^2: They've since hashed/encrypted the name of the bucket. Look for this instead:

{
	"gate": "segment:inas9yh4296j1g41",
	"gateValue": "false",
	"ruleID": "default"
}

EDIT^3: The gates and limit are now gone: https://www.reddit.com/r/ClaudeAI/comments/1f5rwd3/the_halved_output_length_gate_name_has_been/lkysj3d/

This is a good step forward, but doesn't address the main question - why were they implemented in the first place. I think we should still demand an answer. Because it just feels like they're only sorry they got caught.

Top answer

1 of 5

Thank you for providing examples for us to reproduce. This is a quality rant and a helpful message to others redditors.

2 of 5

I can confirm that. I've made a post about it here, with video proof. https://www.reddit.com/r/ClaudeAI/comments/1f3g1fi/it_looks_like_claude_35s_context_reply_length_has/

16x Prompt

prompt.16x.engineer › blog › claude-sonnet-gpt4-context-window-token-limit

Claude 3.5 Sonnet vs GPT-4o: Context Window and Token Limit | 16x Prompt

For output token limits, Claude 3.5 Sonnet has a maximum output of 4,096 tokens. This means the model can generate responses up to this token limit in one interaction.

Begins with AI

beginswithai.com › handling-large-ai-generated-apps-claudes-limitations-and-solutions

Handling Large AI-Generated Apps: Claude's Limitations and Solutions - Begins w/ AI

October 12, 2024 - Token Limitations: As previously mentioned, the maximum output length is often capped at around 2048 tokens for individual requests. Users may need to break down their requests into smaller parts. Contextual Overload: When Claude re-reads previous messages in a conversation to maintain context, ...

Arsturn

arsturn.com › blog › mastering-claudes-token-limits-a-beginners-guide

Mastering Claude's Token Limits: A Beginner's Guide

It can have a larger maximum output token limit (sometimes up to 16K) compared to Claude 3.5 Sonnet's standard 4K.

TypingMind

blog.typingmind.com › home › claude rate exceeded: guide to fix and prevent the error

Claude Rate Exceeded: Guide to Fix and Prevent the Error

October 9, 2025 - Claude enforces rate limits to ensure fair usage and maintain service stability. These limits include: Requests per minute (RPM) — how many API calls you can make in a minute · Input tokens per minute (ITPM) — how many input tokens (text you send) you can use per minute · Output tokens per minute (OTPM) — how many tokens can be generated in responses per minute

GitHub

github.com › simonw › llm-claude-3 › issues › 11

Support for long output on `claude-3.5-sonnet` · Issue #11 · simonw/llm-claude-3

August 30, 2024 - Pass extra_headers= for this. We've doubled the max output token limit for Claude 3.5 Sonnet from 4096 to 8192 in the Anthropic API. Just add the header "anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15" to your API calls https://simon...

Published Aug 30, 2024

reddit.com › r/claudeai › claude 3.7 output limit in ui

r/ClaudeAI on Reddit: Claude 3.7 output limit in UI

March 3, 2025 -

Since some people have been asking, here's the actual output limit for Sonnet 3.7 with and without thinking:
Non-thinking: 8192 tokens
Non-thinking chat: https://claude.ai/share/af0b52b3-efc3-452b-ad21-5e0f39676d9f

Thinking: 24196 tokens*
Thinking chat: https://claude.ai/share/c3c8cec3-2648-4ec4-a13d-c6cce7735a67

*The thinking tokens don't make a lot of sense to me, as I'd expect them to be 3 * 8192 = 24576, but close enough I guess. Also in the example the thinking tokens itself are 23575 before being cut off in the main response, so thinking alone may actually be longer.

Tokens have been calculated with the token counting API and subtracting 16 tokens (role and some other tokens that are always present).

Hope this helps and also thanks to the discord mod, that shall not be pinged, for the testing prompt.