I'm on the 20x Max plan. I get that Opus will use tokens faster, and Anthropic acknowledged this by increasing the total token usage to be something equivalent to the same amount of usage you'd get with Haiku (whether or not that is really true remains to be seen). However, they didn't raise the context window token limit of 200k (I don't have access to the 1M limit).
I just used my first prompt (which was a pretty standard one for me) to help find an issue that threw an error on my front-end, and after its response (which wasn't that helpful), I'm already down to 9% context remaining before auto-compacting.
If Anthropic is going to acknowledge that token consumption will be higher with Opus and scale some of the limits up accordingly, they really should increase the context limit as well.
I use AI a lot in cases where I need a bit more than 16k input length (GPT3.5's context window limit). GPT3.5's performance is normally fine for me, but I have to use GPT4 to get a longer context window, at a much increased inference price for the many queries I end up racking up over a long session.
The Claude 3 family of models are the first ones that seem to have very respectable performance and have longer (200k) context windows across the entire family (Opus + Sonnet + Haiku). So I'm very excited about the 'Sonnet' model (the middle quality model).
TLDR: It's exciting to see the benchmark results of Opus, but I think Sonnet might enable more new real world use cases than Opus, when considering the context window and the relatively low cost.
Videos
With the introduction of Opus 4.5, Anthropic just updated the Claude Apps (Web, Desktop, Mobile):
For Claude app users, long conversations no longer hit a wall—Claude automatically summarizes earlier context as needed, so you can keep the chat going.
This is so amazing and was my only gripe I had with Claude (besides limits), and why I kept using ChatGPT (for the rolling context window).
Anyone as happy as I am?
I am on the $100 plan using opus 4.5. Good experience so far but I am noticing that I am running out of context WAY faster. Not sure if this is because of Opus, because of my project, or because I downgraded from the $200 plan. Any ideas?
I like Claude 3.7 a lot, but context size was the only downsize. Well, looks like we need to wait one more year for 1M context model.
Even 400K will be a massive improvement! Why 200k?
We have to be doing better than FELON TUSK , right? Right?
I've tested it a few times, and when using Claude 3 Opus through perplexity, it absolutely limits the context length from 200k to ~30k.
On a codebase of 110k tokens, using Claude 3 Opus through Perplexity, it would consistently (and I mean every time of 5 attempts) say that the last function in the program was one that was located about 30k tokens in.
When using Anthropic's API and their web chat, it consistently located the actual final function and could clearly see and recall all 110k tokens of the code.
I also tested this with 3 different books and 2 different codebases and received the same results across the board.
I understand if they have to limit context to offer it unlimited, but not saying that anywhere is a very disappointing marketing strategy. I've seen the rumors of this but I just wanted to add another data point of confirmation that the context window is limited to ~30k tokens.
Unlimited access to Claude 3 Opus is pretty awesome still, as long as you aren't hitting that context window, but this gives me misgivings about what else Perplexity is doing to my prompts under the hood in the name of saving costs.
I’m on Opus 4.5 with Max. Every time I add an image or try to do a slightly serious multi-step task, I get hit with “Context size exceeds the limit. I even tested with a simple single image and was still having issues, super frustrating. Also tried reducing the number of files or content in the conversation,” and was followed by the “Compacting our conversation so we can keep chatting…” spinner after just a few messages.
It was absolutely on fire the last few days – long, complex sessions with multiple files, no issues at all. Then out of nowhere, it starts compacting almost immediately, even if I’m only working off a single image. With a supposed 200k+ context window, this makes zero sense from the user side.
I’ve tried pretty much everything: Opus 4.5 on Max, desktop app, web app, different projects/folders, disabling connectors, restarting, fresh chats, different prompt styles. Same story every time as soon as the convo starts getting butchered by aggressive compaction and length limit warnings.
Is this some bug, server-side issue, or a quiet change to how they’re counting tokens, especially for images and file attachments? Anyone figured out a reliable workaround beyond “new chat every few minutes” or stripping everything down to plain text?
Would love to hear if others are seeing the same thing or if there’s a smarter way to work around these context shenanigans.
I've been using Claude Code for some time now on a smallish project, and I am finding that as of recently the context window seems much smaller than it used to be (Max plan). It compacts, then a about a minute later, it is auto compacting again. My CLAUDE.md is trim, and most tasks are delegated to worker sub-agents.
Out the gate, claude is using 35% context, with 22.5% reserved for auto-compact.
In contrast, codex (which I use for QA) is able to achieve a lot more before it's context window becomes an issue.
Are there any tricks I am not aware of to reduce or optimize the context usage with Claude Code?
Signed everyone that used Claude to write software. At least give us an option to pay for it.
Edit: thank you Anthropic!
I Finally Cracked My Claude Code Context Window Strategy (200k Is Not the Problem)
I’ve been meaning to share this for a while: here’s my personal Claude Code context window strategy that completely changed how I code with LLMs.
If you’ve ever thought “200k tokens isn’t enough” – this post is for you. Spoiler: the problem usually isn’t the window size, it’s how we burn tokens.
1 – Context Token Diet: Turn OFF Auto-Compact Most people keep all the “convenience” features on… and then wonder where their context went.
The biggest hidden culprit for me was Auto Compact.
With Auto Compact ON, my session looked like this:
85k / 200k tokens (43%)
After I disabled it in /config:
38k / 200k tokens (19%)
That’s more than half the initial context usage gone, just by turning off a convenience feature.
My personal rule:
🔴 The initial context usage should never exceed 20% of the total context window.
If your model starts the session already half-full with “helpful” summaries and system stuff, of course it’ll run out of room fast.
“But I Need Auto Compact To Keep Going…?”
Here’s how I work without it.
When tokens run out, most people: 1. Hit /compact 2. Let Claude summarize the whole messy conversation 3. Continue on top of that lossy, distorted summary
The problem: If the model misunderstands your intent during that summary, your next session is built on contaminated context. Results start drifting. Code quality degrades. You feel like the model is “getting dumber over time”.
So I do this instead: 1. Use /export to copy the entire conversation to clipboard 2. Use /clear to start a fresh session 3. Paste the full history in 4. Tell Claude something like: “Continue from here and keep working on the same task.”
This way: • No opaque auto-compacting in the background • No weird, over-aggressive summarization ruining your intent • You keep rich context, but with a clean, fresh session state
Remember: the 200k “used tokens” you see isn’t the same as the raw text tokens of your conversation. In practice, the conversation content is often ~100k tokens or less, so you do still have room to work.
Agentic coding is about productivity and quality. Auto Compact often kills both.
2 – Kill Contaminated Context: One Mission = One Session The second rule I follow:
🟢 One mission, one 200k session. Don’t mix missions.
If the model goes off the rails because of a bad prompt, I don’t “fight” it with more prompts.
Instead, I use a little trick: • When I see clearly wrong output, I hit ESC + ESC • That jumps me back to the previous prompt • I fix the instruction • Regenerate
Result: the bad generations disappear, and I stay within a clean, focused conversation without polluted context hanging around.
Clean session → clean reasoning → clean code. In that environment, Claude + Alfred can feel almost “telepathic” with your intent.
3 – MCP Token Discipline: On-Demand Only Now let’s talk MCP.
Take a look at what happens when you just casually load up a bunch of MCP tools: • Before MCPs: 38k / 200k tokens (19%) • After adding commonly used MCPs: 133k / 200k tokens (66%)
That’s two-thirds of your entire context gone before you even start doing real work.
My approach: • Install MCPs you genuinely need • Keep them OFF by default • When needed: 1. Type @ 2. Choose the MCP from the list 3. Turn it ON, use it 4. Turn it OFF again when done
Don’t let “cool tools” silently eat 100k+ tokens of your context just by existing.
“But What About 1M Token Models Like Gemini?”
I’ve tried those too.
Last month I burned through 1M tokens in a single day using Claude Code API. I’ve also tested Codex, Gemini, Claude with huge contexts.
My conclusion:
🧵 As context gets massive, the “needle in a haystack” problem gets worse. Recall gets noisy, accuracy drops, and the model struggles to pick the right pieces from the pile.
So my personal view:
✅ 200k is actually a sweet spot for practical coding sessions if you manage it properly.
If the underlying “needle in a haystack” issue isn’t solved, throwing more tokens at it just makes a bigger haystack.
So instead of waiting for some future magical 10M-token model, I’d rather: • Upgrade my usage patterns • Optimize how I structure sessions • Treat context as a scarce resource, not an infinite dump
My Setup: Agentic Coding with MoAI-ADK + Claude Code
If you want to turn this into a lifestyle instead of a one-off trick, I recommend trying MoAI-ADK with Claude Code for agentic coding workflows.
👉 GitHub: https://github.com/modu-ai/moai-adk
If you haven’t tried it yet, give it a spin. You’ll feel the difference in how Claude Code behaves once your context is: • Lean (no unnecessary auto compact) • Clean (no contaminated summaries) • Controlled (MCPs only when needed) • Focused (one mission per session)
If this was helpful at all, I’d really appreciate an upvote or a share so more people stop wasting their context windows. 🙏
ClaudeCode #agenticCoding #MCP
Claude Opus 4.5, our frontier coding model, is now available in Claude Code for Pro users. Pro users can select Opus 4.5 using the /model command in their terminal.
Opus 4.5 will consume rate limits faster than Sonnet 4.5. We recommend using Opus for your most complex tasks and using Sonnet for simpler tasks.
To get started:
* Run claude update
* /model opus
I'm curious if anyone has tried this. Please share your experiences. I’d like to know if this resolved your issues and if you were able to fix them. I hope they are bringing this with Claude's Code as well.
Hi everyone,
I'm considering using the Claude-3-Opus model on Poe, but I have a question about the context window size for the 1000 credit "shortened" version compared to the full 200k token version that costs 6000 credits on Anthropic.
Since I'm located in Europe, I don't have direct access to Anthropic to use the full Opus model. So I'm trying to determine if the Poe version with the smaller context window will still meet my needs.
Does anyone happen to know approximately how many tokens the context window is limited to for Claude-3-Opus on Poe? Any insight would be greatly appreciated as I try to decide if it will be suitable for my use case.
Thanks so much for any info you can provide!
POE just doubled the credit for Claude-3. 🤬 Now Claude-3-Opus-200k require 12000 credit and Claude-3-Opus require 2000 credit per message
I have the same question regarding the nebulous context window when not using the Opus-200k. Poe upped the price per message for Claude 3 Opus-200k a few days after its launching. It used to be 1750 tokens/message and now it's 6000/message. Context window is important. It should be communicated...
While I was working inside Claude, I noticed that file uploads that normally easily take up half the knowledge file limit were taking up much less space and that there's now a "Retrieving" indicator off to the right. As a sanity check, I uploaded a file which based on the old context window should have been 900% the limit of Claude's input capabilities. Instead, it says I've only used 88% of the context limit. When I asked it questions about the massive file I uploaded, it seemed be be able to answer intelligently. It appears Claude has found a way to now accept 7x the content it used to, which is HUGE! Are others seeing the same thing?
Tech Crunch:
“There are improvements we made on general long context quality in training with Opus 4.5, but context windows are not going to be sufficient by themselves,” Dianne Na Penn, Anthropic’s head of product management for research, told TechCrunch. “Knowing the right details to remember is really important in complement to just having a longer context window.”
Those changes also enabled a long-requested “endless chat” feature for paid Claude users, which will allow chats to proceed without interruption when the model hits its context window. Instead, the model will compress its context memory without alerting the user.
I am using Opus 4.5 but I am still getting the same old context behaviour.