Videos
I use AI a lot in cases where I need a bit more than 16k input length (GPT3.5's context window limit). GPT3.5's performance is normally fine for me, but I have to use GPT4 to get a longer context window, at a much increased inference price for the many queries I end up racking up over a long session.
The Claude 3 family of models are the first ones that seem to have very respectable performance and have longer (200k) context windows across the entire family (Opus + Sonnet + Haiku). So I'm very excited about the 'Sonnet' model (the middle quality model).
TLDR: It's exciting to see the benchmark results of Opus, but I think Sonnet might enable more new real world use cases than Opus, when considering the context window and the relatively low cost.
Signed everyone that used Claude to write software. At least give us an option to pay for it.
Edit: thank you Anthropic!
What am I doing wrong? CC seems designed to be used as one long conversation, with context compression (auto-compact) happening regularly to cope with Anthropic's embarrassingly limited context window. Trouble is, as soon as it compacts the context window is immediately 80% full again. I would have assumed the compacted context is saved out as a memory for RAG retrieval (kinda like serena) but no, it seems its just loaded in as full context, flooding the window.
Consequently when working on a hard coding problem it cant get more than a couple of steps before compacting again and losing its place. Anyone else experienced this?
I know the paid version of claude allows 200k context limit for windows.
I'm using the free version of claude and have hit the context limit in one of my chats and cannot continue it anymore. I don't want to switch to chatgpt because claude has just given me a lot better results for code.
I'm curious about the context limit for inputs in the free version. I want to compare it to the pro version so I can make a more informed decision on purchasing pro.
Hello, I'm using Claude 3.5 for the last 6 months for coding and development and debugging, then yesterday I was so excited to use 3.7. however the first trial I added a big debug log, and I found it giving error saying input is too big (in thinking mode).... however using 3.5 or 3.7 works just fine with the same debug log...
In another thread people were discussing the official openAI docs that show that chatGPT plus users only get access to 32k context window on the models, not the full 200k context window that models like o3 mini actually have, you only get that when using the model through the API. This has been well known for over a year, but people seemed to not believe it, mainly because you can actually uploaded big documents, like entire books, which clearly have more than 32k tokens of text in them.
The thing is that uploading files to chatGPT causes it to do RAG (Retrieval Augment Generation) in the background, which means it does not "read" the whole uploaded doc. When you upload a big document it chops it up into many small pieces and then when you ask a question it retrieves a small amount of chunks using what is known as a vector similarity search. Which just means it searches for pieces of the uploaded text that seem to resemble or be meaningfully (semantically) related to your prompt. However, this is far from perfect, and it can cause it to miss key details.
This difference becomes evident when comparing to Claude that offers a full ~200k context window without doing any RAG or Gemini which offers 1-2 million tokens of context without RAG as well.
I went out of my way to test this for comments on that thread. The test is simple. I grabbed a text file of Alice in Wonderland which is almost 30k words long, which in tokens is larger than the 32k context window of chatGPT, since each English word is around 1.25 tokens long. I edited the text to add random mistakes in different parts of the text. This is what I added:
Mistakes in Alice in Wonderland
-
The white rabbit is described as Black, Green and Blue in different parts of the book.
-
In one part of the book the Red Queen screamed: “Monarchy was a mistake”, rather than "Off with her head"
-
The Caterpillar is smoking weed on a hookah lol.
I uploaded the full 30k words long text to chatGPT plus and Claude pro and asked both a simple question without bias or hints:
The txt file and the prompt"List all the wrong things on this text."
In the following image you can see that o3 mini high missed all the mistakes and Claude Sonnet 3.5 caught all the mistakes.
So to recapitulate, this is because RAG is based on retrieving chunks of the uploaded text through a similarly search based on the prompt. Since my prompt did not include any keyword or hints of the mistakes, then the search did not retrieve the chunks with the mistakes, so o3-mini-high had no idea of what was wrong in the uploaded document, it just gave a generic answer based on it's pre-training knowledge of Alice in Wonderland.
Meanwhile Claude does not use RAG, it ingested the whole text, its 200k tokens long context window is enough to contain the whole novel. So its answer took everything into consideration, that's why it did not miss even those small mistakes among the large text.
So now you know why context window size is so important. Hopefully openAI raises the context window size for plus users at some point, since they have been behind for over a year on this important aspect.