claude 3.7 output token limit - Brave Search

reddit.com › r/claudeai › claude 3.7 output limit in ui

r/ClaudeAI on Reddit: Claude 3.7 output limit in UI

March 3, 2025 -

Since some people have been asking, here's the actual output limit for Sonnet 3.7 with and without thinking:
Non-thinking: 8192 tokens
Non-thinking chat: https://claude.ai/share/af0b52b3-efc3-452b-ad21-5e0f39676d9f

Thinking: 24196 tokens*
Thinking chat: https://claude.ai/share/c3c8cec3-2648-4ec4-a13d-c6cce7735a67

*The thinking tokens don't make a lot of sense to me, as I'd expect them to be 3 * 8192 = 24576, but close enough I guess. Also in the example the thinking tokens itself are 23575 before being cut off in the main response, so thinking alone may actually be longer.

Tokens have been calculated with the token counting API and subtracting 16 tokens (role and some other tokens that are always present).

Hope this helps and also thanks to the discord mod, that shall not be pinged, for the testing prompt.

It's not 128k for thinking on output?

Very useful info. I believe in the API, if you explicitly specify max tokens and token thinking budget, it will aim to reach those rather than them being merely limits. Says in the docs somewhere

docs.aws.amazon.com › amazon bedrock › user guide › amazon bedrock foundation model information › inference request parameters and response fields for foundation models › anthropic claude models › *new* anthropic claude 3.7 sonnet

*NEW* Anthropic Claude 3.7 Sonnet - Amazon Bedrock

Along with extended thinking, Claude 3.7 Sonnet allows up to 128K output tokens per request (up to 64K output tokens is considered generally available, but outputs between 64K and 128K are in beta). Additionally, Anthropic has enhanced its computer use beta with support for new actions.

Videos

Claude Code Is Great but its a token HOG! - YouTube

February 25, 2025

Building an agent with Claude 3.7, Anthropic's new reasoning model ...

February 25, 2025

Claude 3.7 Sonnet: The AI Code King Has Returned (within 1 week) ...

February 25, 2025

Cursor releases Claude 3.7 Sonnet Max, but it comes at a price ...

I asked Claude 3.7 Sonnet MAX to clone Minecraft - YouTube

simonwillison.net › 2025 › Feb › 25 › llm-anthropic-014

Claude 3.7 Sonnet, extended thinking and long output, llm-anthropic 0.14

February 25, 2025 - A fascinating new capability of Claude 3.7 Sonnet is that its output limit in extended thinking mode can be extended to an extraordinary 128,000 tokens—15x more than the previous Claude output limit of 8,192 tokens.

github.com › OpenHands › OpenHands › issues › 7023

[Bug]:Claude 3.7 input length and max_tokens exceed context limit: 196973 + 8192 > 204648, decrease input length or max_tokens and try again · Issue #7023 · OpenHands/OpenHands

February 28, 2025 - [Bug]:Claude 3.7 input length and max_tokens exceed context limit: 196973 + 8192 > 204648, decrease input length or max_tokens and try again#7023

Published Feb 28, 2025

simonw.substack.com › p › claude-37-sonnet-extended-thinking

Claude 3.7 Sonnet, extended thinking and long output

February 25, 2025 - A fascinating new capability of Claude 3.7 Sonnet is that its output limit in extended thinking mode can be extended to an extraordinary 128,000 tokens - 15x more than the previous Claude output limit of 8,192 tokens.

github.com › cline › cline › issues › 1947

Claude 3.7 showing 8192 output tokens instead of 128.000 (Beta header missing?) · Issue #1947 · cline/cline

February 25, 2025 - Claude 3.7 showing 8192 output tokens instead of 128.000 (Beta header missing?)#1947 · Copy link · Darayavaush-84 · opened · on Feb 25, 2025 · Issue body actions · Claude 3.7 supports 128.0000 token output context, but cline still shows ...

Published Feb 25, 2025

prompthub.us › models › claude-3-7-sonnet

Claude 3.7 Sonnet Model Card

While the exact context window is not explicitly specified, the model supports an extended output capacity of up to 128K tokens in extended thinking mode. Claude 3.7 Sonnet can generate up to 128K tokens in a single output when using its extended thinking mode.

anthropic.com › news › claude-3-7-sonnet

Claude 3.7 Sonnet and Claude Code

Second, when using Claude 3.7 Sonnet through the API, users can also control the budget for thinking: you can tell Claude to think for no more than N tokens, for any value of N up to its output limit of 128K tokens.

x.com › simonw › status › 1894448606960390211

XユーザーのSimon Willisonさん: 「It took some experimenting but I finally found a Claude 3.7 Sonnet prompt that used up most of that new 120,000 output token limit... and took 27 minutes and cost me $1.92 in API fees One of the few times I've felt guilty about the energy usage of one of my individual prompts! https://t.co/DquvaLb8Au」 / X

It took some experimenting but I finally found a Claude 3.7 Sonnet prompt that used up most of that new 120,000 output token limit...

Find elsewhere

Google Bing Mojeek

reddit.com › r/claudeai › has claude pro token limit for individual responses been reduced for sonnet 3.7, or is it just me?

r/ClaudeAI on Reddit: Has Claude Pro token limit for individual responses been reduced for Sonnet 3.7, or is it just me?

March 26, 2025 -

I've been using Claude Pro for a while now, and I noticed something strange today. When using Sonnet 3.7, it seems like the token limit for individual responses is lower than before. Previously Claude could generate much longer single responses, but now it seems to cut off earlier.

Has anyone else experienced this? Did Anthropic reduce the response length limits for Claude Pro recently, or am I imagining things? I couldn't find any announcement about changes to the limits.

If you've noticed the same thing or have any information about this, I'd appreciate hearing about it!

Thanks!

I feel the same way. When I was writing a story today, I noticed that its maximum output has decreased. I had to use several more token to get it to finish the story.

Sadly, they have 100% reduced it, I noticed this some hours ago as well.

reddit.com › r/claudeai › i love claude 3.7... just one problem: the rate limits make it unusable

r/ClaudeAI on Reddit: I love Claude 3.7... just one problem: The rate limits make it unusable

February 27, 2025 -

I've been experimenting with Claude 3.7 and I'm genuinely impressed with its capabilities. The quality of responses and reasoning is excellent, especially for coding tasks. However, as a free user, I'm finding it practically unusable due to the severe rate limits.

I can only get through about 1-2 coding prompts per day before hitting the limit. This makes it impossible to have any meaningful ongoing development session or troubleshooting conversation.

I would happily pay for a subscription if the context window was significantly larger. The current 8k token limit is simply too restrictive for serious work. For comparison, I regularly use Gemini 2.0 Pro which offers a 2 million token context window, allowing me to include entire codebases and documentation in my prompts. Look at grok and GPT-o3-mini, both models are comparable in terms of quality and i get many times the usage as a free user, grok 3 has 50 normal prompts a day and 10 thinking prompts a day, 03-mini gets unlimited 4o mini, tens of thousands of 4o tokens, and over a dozen 03 prompts, without paying a dime, all models having a much larger context window.

With just 8k tokens, I can barely fit a moderate-sized function and its related components before running out of space. Let along giving Claude frontend context. This means constantly having to reframe my questions and lose context, making complex programming tasks frustratingly inefficient.

Does anyone else feel the same way? I want to support Claude and would gladly pay for a better experience, but the current limitations make it hard to justify even for a paid tier.

Let’s face it, you don’t want to/can’t afford the $20 for the 200K context token limit. I’m shocked at these people who don’t realize the amount of value that they get from these services. “It’s the greatest thing ever! But I don’t want to pay the price of a burger and fries every month!”

Claude has a 200k token context window. Its output tokens which were capped at 8192, though with 3.7 that goes up to 64.000 now.

github.com › BerriAI › litellm › issues › 8835

[Bug]: LiteLLM Proxy Incorrectly Sets max_tokens to 4096 for Claude 3.5/3.7 · Issue #8835 · BerriAI/litellm

February 26, 2025 - What happened? For Anthropic Claude 3.5 and 3.7 models, the default maximum output token count should be 8192. However, when sending a request without the max_tokens parameter to the LiteLLM proxy, it adds a max_tokens parameter with the...

Published Feb 26, 2025

prompt.16x.engineer › blog › claude-sonnet-gpt4-context-window-token-limit

Claude 3.5 Sonnet vs GPT-4o: Context Window and Token Limit | 16x Prompt

In this post, we'll compare the latest models from OpenAI and Anthropic in terms of their context window and token limits. Claude 3.5 Sonnet output token limit is 8192 in beta and requires the header anthropic-beta: max-tokens-3-5-sonnet-2024-07-15.

teamai.com › home › understanding different claude models: a guide to anthropic’s ai › large language models (llms)

Claude Models Comparison From Haiku to 3.7 Sonnet

March 29, 2025 - Context window: With a context window of up to 128,000 tokens, Claude 3.7 Sonnet provides extensive capacity for handling large inputs. By default, it supports a maximum output of 64,000 tokens, empowering users to work with more extensive and ...

aboutamazon.com › news › aws › claude-3-7-sonnet-anthropic-amazon-bedrock

Claude 3.7 Sonnet: Anthropic’s most intelligent model now available on Amazon Bedrock

February 24, 2025 - What this means is they can tell Claude to think for no more than N tokens, for any value of N up to its output limit of 128K tokens. This allows them to trade off speed (and cost) for quality of answer.

reddit.com › r/claudeai › claude 4 output token limits - information request

r/ClaudeAI on Reddit: Claude 4 Output Token Limits - Information Request

May 31, 2025 -

Claude Sonnet 3.7 had output limits of 8k tokens in normal mode and 64k in Thinking Mode. However, I can't find official documentation about Claude Sonnet 4's output limits in normal mode, nor information about Claude Opus 4's limits.

Does anyone have this information or know where to find it?

~16k without extended reasoning: https://claude.ai/share/d1b8e553-b3f5-4134-9c39-401d6a14acba https://imgur.com/a/mS1Z8Zc Here's the attached document to get it not to hedge and stuff about length: https://gist.github.com/Richard-Weiss/3287ad8b02ac5ac380b01dcaaf622919 For API you can look here: https://docs.anthropic.com/en/docs/about-claude/models/overview#model-comparison-table

reddit.com › r/claudeai › more details on claude 3.7 sonnet

r/ClaudeAI on Reddit: More details on claude 3.7 sonnet

October 10, 2024 - Sonnet 3.5 has a 200-500k token limit why would 3.7 be less ... It's max tokens, not context window. In the API they use "max_tokens" to specify the maximum output tokens it can generate.

docs.claude.com › en › api › rate-limits

Rate limits - Claude Docs

In order to protect Workspaces in your Organization from potential overuse, you can set custom spend and rate limits per Workspace. Example: If your Organization’s limit is 40,000 input tokens per minute and 8,000 output tokens per minute, you ...

repost.aws › questions › QUmhH3_oqCTRm1PlshqIWqEg › aws-bedrock-max-tokens-for-claude-models-are-much-lower-when-using-on-demand-throughput

[AWS Bedrock]: Max tokens for Claude models are much lower when using on-demand throughput | AWS re:Post

When using Claude models with on-demand throughput in Amazon Bedrock, there are indeed differences between the advertised context window sizes and the actual token limits you can use in practice. For Claude models on Amazon Bedrock, there are two important considerations: 1. **Token Limits vs. Context Window**: While the models have a 200K token context window capability, the actual token limits for on-demand usage are lower. Claude Sonnet 4 appears to have a 65,536 token limit for on-demand usage, while Claude 3.7 Sonnet has a higher limit of 131,072 tokens. 2. **Prompt + Response Tokens**: It's important to note that the max_tokens parameter represents the combined total of your prompt tokens plus the maximum number of tokens you want in the response. With newer Claude models (3.7 and 4), if the sum of prompt tokens and max_tokens exceeds the model's context window, the system will return a validation error rather than automatically adjusting the max_tokens value. When using extended thinking features with Claude models, additional considerations apply. For example, the thinking budget (if enabled) must be set to a value less than max_tokens, except when using Interleaved thinking with tools, where the token limit becomes your entire context window (200K tokens). If you need higher token limits or throughput for your application, you have a few options: 1. Contact AWS Support directly to request quota increases for specific models. This requires creating a support case and explaining your use case in detail. 2. Consider using Provisioned Throughput instead of on-demand if you need consistent access to higher token limits, though this comes with additional costs. 3. Implement retry logic with exponential backoff in your application to handle any throttling exceptions that might occur. Unfortunately, detailed documentation specifically about the max token limits for on-demand throughput with Claude models can be hard to find. The behavior you're experiencing suggests that while the models technically support 200K tokens, the on-demand service has different limits applied. **Sources** Extended thinking - Amazon Bedrock Why cant i invoke claude 3.5 more than one per minute ? | AWS re:Post Building Agentic AI app with Strands Agents & MCP | AWS re:Post Can not increate invokeModel request to Claude 3.5 Sonnet v2 | AWS re:Post

Adding clarification as the one from `AWS Generative AI ` is not entirely correct and can be misleading. As per the error, this is a limitation/constraint on the model itself. It has nothing to do with on-demand usage. The error states: `The maximum tokens you requested exceeds the model limit of 131072` for Sonnet 3.7. This is inline with Max output setting on that model (128K). Similarly for Sonnet 4, the max output limit communicated by Anthropic is 64K (which equivalent to 65536. You will only get this error if you configure max_tokens (which represent number of tokens to be generated, i.e output) to a value higher than Max output tokens. max_tokens take the number of input tokens you are sending. On the AWS console and using API, you can send input of nearly 200k token to Sonnet 3.7 and 4 without encountering this issue. See screenshot for 4.5 (from https://docs.claude.com/en/docs/about-claude/models/overview, reference might change in the future). Specifically for strands sdk, `max_tokens` is described as `Maximum number of tokens to generate ` !Enter image description here

reddit.com › r/claudeai › all hype/complain-posts aside, it's amazing that claude 3.7 allows about 8k token output at once.

r/ClaudeAI on Reddit: All hype/complain-posts aside, it's amazing that Claude 3.7 allows about 8k token output at once.

September 25, 2024 -

That you can nuke 7k words and it'll translate that all at once is amazing. Also far less censored. Really nice model.

Claude 3.7 Sonnet allows up to 128K output tokens per reques That's what I see. Where did you see 8k limit? That'd be a jump back to 2022

I've gotten 1.4k lines in code in one output. That's incredible.