I feel the same way. When I was writing a story today, I noticed that its maximum output has decreased. I had to use several more token to get it to finish the story. Answer from Different_Station283 on reddit.com
🌐
Reddit
reddit.com › r/claudeai › claude 3.7 output limit in ui
r/ClaudeAI on Reddit: Claude 3.7 output limit in UI
March 3, 2025 -

Since some people have been asking, here's the actual output limit for Sonnet 3.7 with and without thinking:
Non-thinking: 8192 tokens
Non-thinking chat: https://claude.ai/share/af0b52b3-efc3-452b-ad21-5e0f39676d9f

Thinking: 24196 tokens*
Thinking chat: https://claude.ai/share/c3c8cec3-2648-4ec4-a13d-c6cce7735a67

*The thinking tokens don't make a lot of sense to me, as I'd expect them to be 3 * 8192 = 24576, but close enough I guess. Also in the example the thinking tokens itself are 23575 before being cut off in the main response, so thinking alone may actually be longer.

Tokens have been calculated with the token counting API and subtracting 16 tokens (role and some other tokens that are always present).

Hope this helps and also thanks to the discord mod, that shall not be pinged, for the testing prompt.

🌐
Substack
simonw.substack.com › p › claude-37-sonnet-extended-thinking
Claude 3.7 Sonnet, extended thinking and long output
February 25, 2025 - A fascinating new capability of Claude 3.7 Sonnet is that its output limit in extended thinking mode can be extended to an extraordinary 128,000 tokens - 15x more than the previous Claude output limit of 8,192 tokens.
🌐
Anthropic
anthropic.com › news › claude-3-7-sonnet
Claude 3.7 Sonnet and Claude Code
Second, when using Claude 3.7 Sonnet through the API, users can also control the budget for thinking: you can tell Claude to think for no more than N tokens, for any value of N up to its output limit of 128K tokens.
🌐
Simon Willison
simonwillison.net › 2025 › Feb › 25 › llm-anthropic-014
Claude 3.7 Sonnet, extended thinking and long output, llm-anthropic 0.14
February 25, 2025 - A fascinating new capability of Claude 3.7 Sonnet is that its output limit in extended thinking mode can be extended to an extraordinary 128,000 tokens—15x more than the previous Claude output limit of 8,192 tokens.
🌐
Reddit
reddit.com › r/claudeai › has claude pro token limit for individual responses been reduced for sonnet 3.7, or is it just me?
r/ClaudeAI on Reddit: Has Claude Pro token limit for individual responses been reduced for Sonnet 3.7, or is it just me?
March 26, 2025 -

I've been using Claude Pro for a while now, and I noticed something strange today. When using Sonnet 3.7, it seems like the token limit for individual responses is lower than before. Previously Claude could generate much longer single responses, but now it seems to cut off earlier.

Has anyone else experienced this? Did Anthropic reduce the response length limits for Claude Pro recently, or am I imagining things? I couldn't find any announcement about changes to the limits.

If you've noticed the same thing or have any information about this, I'd appreciate hearing about it!

Thanks!

🌐
AWS
docs.aws.amazon.com › amazon bedrock › user guide › amazon bedrock foundation model information › inference request parameters and response fields for foundation models › anthropic claude models › *new* anthropic claude 3.7 sonnet
*NEW* Anthropic Claude 3.7 Sonnet - Amazon Bedrock
Along with extended thinking, Claude 3.7 Sonnet allows up to 128K output tokens per request (up to 64K output tokens is considered generally available, but outputs between 64K and 128K are in beta). Additionally, Anthropic has enhanced its computer use beta with support for new actions.
Top answer
1 of 3
1
When using Claude models with on-demand throughput in Amazon Bedrock, there are indeed differences between the advertised context window sizes and the actual token limits you can use in practice. For Claude models on Amazon Bedrock, there are two important considerations: 1. **Token Limits vs. Context Window**: While the models have a 200K token context window capability, the actual token limits for on-demand usage are lower. Claude Sonnet 4 appears to have a 65,536 token limit for on-demand usage, while Claude 3.7 Sonnet has a higher limit of 131,072 tokens. 2. **Prompt + Response Tokens**: It's important to note that the max_tokens parameter represents the combined total of your prompt tokens plus the maximum number of tokens you want in the response. With newer Claude models (3.7 and 4), if the sum of prompt tokens and max_tokens exceeds the model's context window, the system will return a validation error rather than automatically adjusting the max_tokens value. When using extended thinking features with Claude models, additional considerations apply. For example, the thinking budget (if enabled) must be set to a value less than max_tokens, except when using Interleaved thinking with tools, where the token limit becomes your entire context window (200K tokens). If you need higher token limits or throughput for your application, you have a few options: 1. Contact AWS Support directly to request quota increases for specific models. This requires creating a support case and explaining your use case in detail. 2. Consider using Provisioned Throughput instead of on-demand if you need consistent access to higher token limits, though this comes with additional costs. 3. Implement retry logic with exponential backoff in your application to handle any throttling exceptions that might occur. Unfortunately, detailed documentation specifically about the max token limits for on-demand throughput with Claude models can be hard to find. The behavior you're experiencing suggests that while the models technically support 200K tokens, the on-demand service has different limits applied. **Sources** Extended thinking - Amazon Bedrock Why cant i invoke claude 3.5 more than one per minute ? | AWS re:Post Building Agentic AI app with Strands Agents & MCP | AWS re:Post Can not increate invokeModel request to Claude 3.5 Sonnet v2 | AWS re:Post
2 of 3
0
Adding clarification as the one from `AWS Generative AI ` is not entirely correct and can be misleading. As per the error, this is a limitation/constraint on the model itself. It has nothing to do with on-demand usage. The error states: `The maximum tokens you requested exceeds the model limit of 131072` for Sonnet 3.7. This is inline with Max output setting on that model (128K). Similarly for Sonnet 4, the max output limit communicated by Anthropic is 64K (which equivalent to 65536. You will only get this error if you configure max_tokens (which represent number of tokens to be generated, i.e output) to a value higher than Max output tokens. max_tokens take the number of input tokens you are sending. On the AWS console and using API, you can send input of nearly 200k token to Sonnet 3.7 and 4 without encountering this issue. See screenshot for 4.5 (from https://docs.claude.com/en/docs/about-claude/models/overview, reference might change in the future). Specifically for strands sdk, `max_tokens` is described as `Maximum number of tokens to generate ` !Enter image description here
🌐
Prompthub
prompthub.us › models › claude-3-7-sonnet
Claude 3.7 Sonnet Model Card
While the exact context window is not explicitly specified, the model supports an extended output capacity of up to 128K tokens in extended thinking mode. Claude 3.7 Sonnet can generate up to 128K tokens in a single output when using its extended thinking mode.
Find elsewhere
🌐
Reddit
reddit.com › r/claudeai › more details on claude 3.7 sonnet
r/ClaudeAI on Reddit: More details on claude 3.7 sonnet
October 9, 2024 - Sonnet 3.5 has a 200-500k token limit why would 3.7 be less ... It's max tokens, not context window. In the API they use "max_tokens" to specify the maximum output tokens it can generate.
🌐
Amazon
aboutamazon.com › news › aws › claude-3-7-sonnet-anthropic-amazon-bedrock
Claude 3.7 Sonnet: Anthropic’s most intelligent model now available on Amazon Bedrock
February 24, 2025 - Claude 3.7 Sonnet can produce responses up to 128,000 tokens long—15 times longer than its predecessor—with 64,000 output tokens enabled by default. A token is the smallest unit of text data a model can process (e.g., a word, phrase, or ...
🌐
16x Prompt
prompt.16x.engineer › blog › claude-sonnet-gpt4-context-window-token-limit
Claude 3.5 Sonnet vs GPT-4o: Context Window and Token Limit | 16x Prompt
In this post, we'll compare the latest models from OpenAI and Anthropic in terms of their context window and token limits. Claude 3.5 Sonnet output token limit is 8192 in beta and requires the header anthropic-beta: max-tokens-3-5-sonnet-2024-07-15.
🌐
AWS re:Post
repost.aws › questions › QUh5t0kGHIRdajNCDm1UHZxA › issue-with-bedrock-claude-sonnet-3-5
Issue with Bedrock- Claude Sonnet 3.5 | AWS re:Post
July 4, 2024 - ... Input context window is 200,000 tokens but as you are generating tokens to answer your question you are most likely hitting output token limits which are 4096 tokens see Anthropic user guide for more details.
🌐
GitHub
github.com › boto › boto3 › issues › 4279
Claude 3.5 Sonnet is limited to 4096 tokens - should be 8192 · Issue #4279 · boto/boto3
September 19, 2024 - Try again with a maximum tokens value that is lower than 4096. import boto3 from anthropic import AnthropicBedrock from *** import AWS_ACCESS_KEY_ID, AWS_REGION, AWS_SECRET_ACCESS_KEY client = AnthropicBedrock( aws_access_key=AWS_ACCESS_KEY_ID, ...
Published   Sep 19, 2024
🌐
GitHub
github.com › cline › cline › issues › 1947
Claude 3.7 showing 8192 output tokens instead of 128.000 (Beta header missing?) · Issue #1947 · cline/cline
February 25, 2025 - Claude 3.7 supports 128.0000 token output context, but cline still shows 8192: Side note: selecting Sonnet 3.7 in OpenRouter shows the correct info. Update Cline to the latest release which supports Sonnet 3.7 · Go into Cline settings, choose ...
Published   Feb 25, 2025
🌐
Cursor
forum.cursor.com › ideas › feedback
The "Whole 200k Context Window" of Claude 3.7 Sonnet Max - Feedback - Cursor - Community Forum
March 25, 2025 - I’ve spent considerable time (and yes, money too) thoroughly verifying this finding. This wasn’t just a one-off test but a methodical investigation to ensure my observations were consistent and accurate. As for now,anyon…
🌐
Anthropic
anthropic.com › news › token-saving-updates
Token-saving updates on the Anthropic API | Claude
March 13, 2025 - This means you can now optimize your prompt caching usage to increase throughput and get more out of your existing ITPM rate limits. Your Output Tokens Per Minute (OTPM) rate limit remains the same.
🌐
AWS
aws.amazon.com › blogs › aws › anthropics-claude-3-7-sonnet-the-first-hybrid-reasoning-model-is-now-available-in-amazon-bedrock
Anthropic’s Claude 3.7 Sonnet hybrid reasoning model is now available in Amazon Bedrock | Amazon Web Services
March 4, 2025 - Claude 3.7 Sonnet supports outputs up to 128K tokens long (up to 64K as generally available and up to 128K as a beta). Adjustable reasoning budget – You can control the budget for thinking when you use Claude 3.7 Sonnet in Amazon Bedrock.