claude 3.5 sonnet token limit

reddit.com › r/claudeai › token limit for claude 3.5 sonnet api

r/ClaudeAI on Reddit: Token limit for Claude 3.5 Sonnet API

July 24, 2024 -

I'm using the claude-3-5-sonnet-20240620 API and I need to set the max_tokens. The documentation clearly says the max output for this model should be 8,192 tokens, but I get an error saying the max is 4,096, like for the older models. Am I missing something, or did Anthropric fuck up the validation for the API ?

Top answer

1 of 3

Have you seen this too in the docs?: 8192 output tokens is in beta and requires the header anthropic-beta: max-tokens-3-5-sonnet-2024-07-15. If the header is not specified, the limit is 4096 tokens.

2 of 3

Yeah, use the Header with beta for 8192 token..otherwise it's 4096

GitHub

github.com › boto › boto3 › issues › 4279

Claude 3.5 Sonnet is limited to 4096 tokens - should be 8192 · Issue #4279 · boto/boto3

September 19, 2024 - Should work with maxTokens up to 8192. client = boto3.client( 'bedrock-runtime', region_name=AWS_REGION, aws_access_key_id=AWS_ACCESS_KEY_ID, aws_secret_access_key=AWS_SECRET_ACCESS_KEY, ) response = client.converse( modelId=f"arn:aws:bedro...

Published Sep 19, 2024

Videos

26:29

YouTube

Claude 3.5 Sonnet API: Integrate The Best LLM into your App ...

June 23, 2024

25:41

YouTube

Building a RAG Pipeline with Anthropic Claude Sonnet 3.5 - YouTube

June 22, 2024

08:07

YouTube

New Claude 3.5 DESTROYS OpenAI's GPT-4o In All Benchmarks! - YouTube

prompt.16x.engineer › blog › claude-sonnet-gpt4-context-window-token-limit

Claude 3.5 Sonnet vs GPT-4o: Context Window and Token Limit | 16x Prompt

In this post, we'll compare the latest models from OpenAI and Anthropic in terms of their context window and token limits. Claude 3.5 Sonnet output token limit is 8192 in beta and requires the header anthropic-beta: max-tokens-3-5-sonnet-2024-07-15.

Analytics India Magazine

analyticsindiamag.com › aim › ai news › anthropic doubles claude 3.5 sonnet api’s output token limit to 8k tokens

Anthropic Doubles Claude 3.5 Sonnet API's Output Token Limit to 8K Tokens | AIM

July 17, 2024 - Anthropic announced that it has expanded the capabilities of its Claude 3.5 Sonnet AI model by doubling the maximum output token limit from 4,096 to 8,192 tokens.

Claude 3

claude3.pro › claude-3-5-sonnet-vs-gpt-4o-mini-token-limits

Claude 3.5 Sonnet vs. GPT-4O Mini: Token Limits – Claude 3

Its limit of 8,000 tokens is adequate for detailed responses and moderately extended interactions, though users may need to segment very long content. Contextual Depth: Claude 3.5 Sonnet’s higher token limit allows for greater contextual depth and coherence in responses.

AWS re:Post

repost.aws › questions › QUh5t0kGHIRdajNCDm1UHZxA › issue-with-bedrock-claude-sonnet-3-5

Issue with Bedrock- Claude Sonnet 3.5 | AWS re:Post

July 4, 2024 - Input context window is 200,000 tokens but as you are generating tokens to answer your question you are most likely hitting output token limits which are 4096 tokens see Anthropic user guide for more details.

Anthropic

anthropic.com › news › claude-3-5-sonnet

Introducing Claude 3.5 Sonnet

Claude 3.5 Sonnet is now available ... via the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. The model costs $3 per million input tokens and $15 per million output tokens, with a 200K token context window....

GitHub

github.com › paul-gauthier › aider › issues › 867

Add support for 8k tokens with Claude 3.5 Sonnet · Issue #867 ...

July 15, 2024 - Issue Claude 3.5 Sonnet now supports a maximum output/response token length of 8,192 tokens (from 4,096 currently). To support 8k tokens, we need to add the header "anthropic-beta": "max-tokens-3-5-sonnet-2024-07-15" to our API calls:

Published Jul 15, 2024

AWS re:Post

repost.aws › questions › QUBAuTamJrT8OFlkH7d-FdxQ › aws-bedrock-claude-3-5-api-want-to-apply-max-output-token-to-header

[AWS Bedrock] Claude 3.5 API - want to apply max output token to header | AWS re:Post

July 26, 2024 - link: https://docs.anthropic.com/en/docs/about-claude/models · "8192 output tokens is in beta and requires the header anthropic-beta: max-tokens-3-5-sonnet-2024-07-15. If the header is not specified, the limit is 4096 tokens."

Find elsewhere

Google Bing Mojeek

GitHub

github.com › Aider-AI › aider › issues › 705

Sonnet 3.5 is using a lot of output tokens, hitting 4k output token limit · Issue #705 · Aider-AI/aider

May 10, 2024 - Issue Asking for a large-scale change with sonnet, I see this output: Model claude-3-5-sonnet-20240620 has hit a token limit! Input tokens: 4902 of 200000 Output tokens: 3220 of 4096 Total tokens: 8122 of 200000 For more info: https://ai...

Published Jun 22, 2024

Cursor

forum.cursor.com › feature requests

[Feature Request] Add support for new Claude Sonnet 3.5 8K output token limit - Feature Requests - Cursor - Community Forum

December 11, 2023 - Please add support for the recently adjusted ouput limit of 8k tokens for Sonnet 3.5. See x.com This could generate much larger answers and code without interruption. Thank you Felix

reddit.com › r/claudeai › has claude pro token limit for individual responses been reduced for sonnet 3.7, or is it just me?

r/ClaudeAI on Reddit: Has Claude Pro token limit for individual responses been reduced for Sonnet 3.7, or is it just me?

March 26, 2025 -

I've been using Claude Pro for a while now, and I noticed something strange today. When using Sonnet 3.7, it seems like the token limit for individual responses is lower than before. Previously Claude could generate much longer single responses, but now it seems to cut off earlier.

Has anyone else experienced this? Did Anthropic reduce the response length limits for Claude Pro recently, or am I imagining things? I couldn't find any announcement about changes to the limits.

If you've noticed the same thing or have any information about this, I'd appreciate hearing about it!

Thanks!

Top answer

1 of 8

I feel the same way. When I was writing a story today, I noticed that its maximum output has decreased. I had to use several more token to get it to finish the story.

2 of 8

Sadly, they have 100% reduced it, I noticed this some hours ago as well.

Oncely

oncely.com › blog › claude-3-5-sonnet-vs-gpt-4o-context-window-and-token-limit-2

Claude 3.5 Sonnet vs GPT-4o: Context Window and Token Limit

March 25, 2025 - For example, the sentence "AI is ... and handle complex tasks. Claude 3.5 Sonnet boasts a massive context window of 200,000 tokens, making it one of the most advanced AI models in terms of memory capacity....

Artificial Analysis

artificialanalysis.ai › models › claude-35-sonnet

Claude 3.5 Sonnet (Oct) - Intelligence, Performance & Price Analysis

Analysis of Anthropic's Claude 3.5 Sonnet (Oct '24) and comparison to other AI models across key metrics including quality, price, performance (tokens per second & time to first token), context window & more.

AWS re:Post

repost.aws › questions › QUmhH3_oqCTRm1PlshqIWqEg › aws-bedrock-max-tokens-for-claude-models-are-much-lower-when-using-on-demand-throughput

[AWS Bedrock]: Max tokens for Claude models are much lower when using on-demand throughput | AWS re:Post

Top answer

1 of 3

When using Claude models with on-demand throughput in Amazon Bedrock, there are indeed differences between the advertised context window sizes and the actual token limits you can use in practice. For Claude models on Amazon Bedrock, there are two important considerations: 1. **Token Limits vs. Context Window**: While the models have a 200K token context window capability, the actual token limits for on-demand usage are lower. Claude Sonnet 4 appears to have a 65,536 token limit for on-demand usage, while Claude 3.7 Sonnet has a higher limit of 131,072 tokens. 2. **Prompt + Response Tokens**: It's important to note that the max_tokens parameter represents the combined total of your prompt tokens plus the maximum number of tokens you want in the response. With newer Claude models (3.7 and 4), if the sum of prompt tokens and max_tokens exceeds the model's context window, the system will return a validation error rather than automatically adjusting the max_tokens value. When using extended thinking features with Claude models, additional considerations apply. For example, the thinking budget (if enabled) must be set to a value less than max_tokens, except when using Interleaved thinking with tools, where the token limit becomes your entire context window (200K tokens). If you need higher token limits or throughput for your application, you have a few options: 1. Contact AWS Support directly to request quota increases for specific models. This requires creating a support case and explaining your use case in detail. 2. Consider using Provisioned Throughput instead of on-demand if you need consistent access to higher token limits, though this comes with additional costs. 3. Implement retry logic with exponential backoff in your application to handle any throttling exceptions that might occur. Unfortunately, detailed documentation specifically about the max token limits for on-demand throughput with Claude models can be hard to find. The behavior you're experiencing suggests that while the models technically support 200K tokens, the on-demand service has different limits applied. **Sources** Extended thinking - Amazon Bedrock Why cant i invoke claude 3.5 more than one per minute ? | AWS re:Post Building Agentic AI app with Strands Agents & MCP | AWS re:Post Can not increate invokeModel request to Claude 3.5 Sonnet v2 | AWS re:Post

2 of 3

Adding clarification as the one from `AWS Generative AI ` is not entirely correct and can be misleading. As per the error, this is a limitation/constraint on the model itself. It has nothing to do with on-demand usage. The error states: `The maximum tokens you requested exceeds the model limit of 131072` for Sonnet 3.7. This is inline with Max output setting on that model (128K). Similarly for Sonnet 4, the max output limit communicated by Anthropic is 64K (which equivalent to 65536. You will only get this error if you configure max_tokens (which represent number of tokens to be generated, i.e output) to a value higher than Max output tokens. max_tokens take the number of input tokens you are sending. On the AWS console and using API, you can send input of nearly 200k token to Sonnet 3.7 and 4 without encountering this issue. See screenshot for 4.5 (from https://docs.claude.com/en/docs/about-claude/models/overview, reference might change in the future). Specifically for strands sdk, `max_tokens` is described as `Maximum number of tokens to generate ` !Enter image description here

LLM Token Calculator

token-calculator.com › models › claude-3-5-sonnet

Claude 3.5 Sonnet - Anthropic | Token Calculator

Model capabilities and limits · Max Output Tokens8,192 · Context Window200,000 · ProviderAnthropic · API Information · Model API endpoint and identifier for developers · claude-3-5-sonnet · Use this identifier when making API calls to ...

reddit.com › r/claudeai › conversation limit in claude.ai not 200k after sonnet 4.5 release - only 100-125k!!

r/ClaudeAI on Reddit: Conversation limit in Claude.ai NOT 200K after Sonnet 4.5 release - only 100-125K!!

October 9, 2025 -

Claude.ai interface
Personal preferences = ONLY MY NAME
Capabilities = ALL OFF
Connectors = ALL OFF

Observations:

Ask Claude to conduct search that minimizes tokens to enable system's token usage vs budget.
Claude responds by searching for "a" with 10 websites and then observes ~15K-16K tokens used out of 190K (costs ~3K tokens to do the search)
I continue to provide large txt files at each turn and ask Claude to do the minimal search and to provide the system's token count. Each search costs ~3K.

I repeat these steps and never hit 190K. My last test hit the maximum length for this conversation at only ~150K. Other tests were as low as 110K-126K.

I did this testing because I was finding Sonnet 4.5 was hitting the maximum conversation length way earlier than Opus 4.1 in my experience.

I'm a heavy user. Max 20x Pro Plan $200/mo.

Something is WRONG!! Either the counter is wrong OR the maximum window is effectively not working. But my sense is that the maximum window is off because I use this product enough to know that I'm getting less "mileage" during a conversation with Sonnet 4.5.

Anyone else sensing Sonnet 4.5's conversation limit is being reached faster than usual?

Top answer

1 of 5

I've been looking at numbers too... Here's my example using a tokenizer to get rough sizes: 8,787 - Claude Sonnet 4.5 System Prompt 424 - Personal Preferences 2,554 - CI 4,726 - Background file 127 - Claude Extra Directives 1,436 - Creative Directives 31,878 - Summary_data_long 12,975 - Summary_data_compressed 13,381 - Key_summary_data 3,924 - Miscellaneous 1,608 - Writing Directives --- Sonnet 4.5 System prompt + Personal preferences + Claude Project CI + files = Roughly 81,820 tokens Test Session (iOS Claude App) : 11,616 tokens in session until length error hit (+81,920 = 93,536 tokens consumed total) I was able to continue the same session on the MacOS client until the session reached 25,364 tokens then I received the same length error (So with the base overhead +81,920 = 107,284 tokens consumed total) I know there's some XML tags behind the scenes that eat tokens as well. I also know that my token estimates aren't 100%, but... I feel like they should be close enough. Even if you assume I'm off by as much as 20%, there's still a HUGE gap between 107K tokens and the alleged 200K tokens we're supposed to see for context.

2 of 5

Why do we think it has precision insight into token counts?

Prompthub

prompthub.us › models › claude-3-5-sonnet

Claude 3.5 Sonnet Model Card

Claude 3.5 Sonnet supports a context window of up to 200,000 tokens.

Anthropic

anthropic.com › news › claude-3-7-sonnet

Claude 3.7 Sonnet and Claude Code

We generally find that prompting for the model works similarly in both modes. Second, when using Claude 3.7 Sonnet through the API, users can also control the budget for thinking: you can tell Claude to think for no more than N tokens, for any value of N up to its output limit of 128K tokens.

reddit.com › r/claudeai › claude 3.7 output limit in ui

r/ClaudeAI on Reddit: Claude 3.7 output limit in UI

March 3, 2025 -

Since some people have been asking, here's the actual output limit for Sonnet 3.7 with and without thinking:
Non-thinking: 8192 tokens
Non-thinking chat: https://claude.ai/share/af0b52b3-efc3-452b-ad21-5e0f39676d9f

Thinking: 24196 tokens*
Thinking chat: https://claude.ai/share/c3c8cec3-2648-4ec4-a13d-c6cce7735a67

*The thinking tokens don't make a lot of sense to me, as I'd expect them to be 3 * 8192 = 24576, but close enough I guess. Also in the example the thinking tokens itself are 23575 before being cut off in the main response, so thinking alone may actually be longer.

Tokens have been calculated with the token counting API and subtracting 16 tokens (role and some other tokens that are always present).

Hope this helps and also thanks to the discord mod, that shall not be pinged, for the testing prompt.

Top answer

1 of 3

It's not 128k for thinking on output?

2 of 3

Very useful info. I believe in the API, if you explicitly specify max tokens and token thinking budget, it will aim to reach those rather than them being merely limits. Says in the docs somewhere