https://docs.anthropic.com/en/docs/about-claude/models
Document has been updated and no mention anywhere. Has there been any official announcement or are they just going to remain silent and hope we forget? Since they told us it was coming I think they should at least make announcement of why it was scrapped and what to expect going forward.
EDIT:
https://x.com/chatgpt21/status/1848776371499372729
Speculation...but it is starting to make sense. If Opus had a failed training run that would be an absolute PR/funding disaster for Anthropic so they would just stay quiet and turn Opus into Sonnet 3.5 and just hope for better luck on the 4.0 series next year.
It makes sense too because this "new" Sonnet 3.5 feels a lot like the old Opus personality with a bit deeper insights and better benchmarks but fairly significant and unexpected regressions in other areas... Something major has happened behind the scenes for sure.
Couple with this excert from The Verge article:
"I’ve heard that the model isn’t showing the performance gains the Demis Hassabis-led team had hoped for, though I would still expect some interesting new capabilities. (The chatter I’m hearing in AI circles is that this trend is happening across companies developing leading, large models.)"
https://www.theverge.com/2024/10/25/24279600/google-next-gemini-ai-model-openai-december
Seems like Anthropic could have been one of the other companies coming up against a hard wall.
Brace yourselves, winter is coming...
Videos
With the hype of Anthropic releasing Opus 3.5 soon, what are your expectations from it?
In my understanding Sonnet 3.5 is already too good at most tasks for me (coding mostly) and if this model is just a scaled up version of Sonnet then it probably won't solve things like counting r's in sentences ( because of tokenization).
It might be a beast on structured output and come with some agents.
Sonnet 3.5 made some impossible tasks possible for me. How much better do you think Opus 3.5 will be?
Are there any charts showing the differences in model size or parameters between Opus 3 and Sonnet 3 so we can get an idea of how much better Opus 3.5 could be?
-
Claude 3.5 Opus was suppose to be released around the same time as Claude 3.5 Haiku
-
They realized that Claude 3.5 Opus did not show that big of an improvement compared to Claude 3.5 Sonnet, so they're unable to released it at the Opus price point ($15/m input tokens and $75/m output tokens)
-
Instead, Anthropic just rebranded Claude 3.5 Opus as the "new" Claude 3.5 Sonnet
-
Think about it, the new Claude 3.5 Sonnet didn't even make sense, the announcement was sudden, the naming convention wasn't very well thought of, their focus was suppose to be on releasing the SOTA 3.5 Opus instead of an upgraded 3.5 Sonnet
-
Right after the release of the new Claude 3.5 Sonnet, they removed any previous mentions of Claude 3.5 Opus
-
New Claude 3.5 Sonnet might be a quantized/distilled version of 3.5 Opus that's cheaper and faster to run; also explains why the new 3.5 Sonnet had some weird behaviors in the beginning (e.g. not answering questions fully) compared to the very well polished old 3.5 Sonnet at its launch
-
Without the higher pricing tier of Claude 3.5 Opus, Anthropic had to make up the lost revenue elsewhere, hence, the price of Claude 3.5 Haiku was raised by 4x
Release wen?
Has anyone really experimented with these two with regards to creative writing? And if so, which one do you think is better? Sonnet 3.5 has a lot of impressive capabilities for sure but I wonder if its writing quality is really on par with Opus 3. Thoughts?
After the issues that had been plaguing me do the general laziness of GPT-4 I had allowed my subscription to lapse and purchased a claude 3 opus subscription from Anthropic. At first I was simply amazed at how accurate the model was compared to the then gimped GPT-4 though I quickly realized that the model and the underlying service had some key issues such as their usage policy which limits the number of prompts In a 5 hour 'at the time I signed up it was 8' period if you upload certain files to it. Which I do quite frequently since it makes it easier to provide some context for any task by uploading a file. So your 45 message limit can quickly become 10 if you don't understand how the context affects the message limit. Furthermore one of the primary selling points of Claude is its large context which is effectively Tantalian curse in the sense that the context is close yet so far we have 200k context to play with but due to the aforementioned usage policy we cannot make practical use of it.
Many will say use the API but the costs are simply absurd if you intend to make the API version of Claude your daily driver. Also Claude tends to be very verbose when it replies to you and the UI of their flagship app leaves much to be desired. Finally the lack of web browsing in Claude means you have to manually verify the output and since Claude is regarded so highly for its intellect it may result in your trusting output you shouldn't.
Throughout it all I was prepared to keep my subscription until the king returned with GPT 4 Turbo w/ vision 2024-04-09 which fixed every major issue I had with the previous model of GPT 4 that I had originally left for Claude, the clear and capable code, the ability to read files with an expanded context without issue, it all became clear that even though Claude may be superior to GPT 4 in some ways the scale of the underlying companies makes GPT 4 the superior choice. Not to mention it took the other companies so long to surpass GPT 4 that was trained on lackluster hard ware what will GPT 5 look like?
I had a subscription last with claude opus last May and did not renew after. That was before Sonnet 3.5 was released, right now I am using it on coding and surprisingly it was better than opus when I used opus last May. Question, is it really better than opus in coding or opus also got upgraded same as sonnet? I am in dillema if I am going to subscribe again or not.
I've been experimenting with both Claude models (Opus 3 and Sonnet 3.5) for creative writing and here are my thoughts. I had the two write potential scenes for my sequel side-by-side and Opus did much better in my opinion in writing even if Sonnet was slightly better at staying consistent with the "lore".
Opus 3
I like Opus' writing style better overall - it has more heart to it. When I need someone to write a new scene for me that I haven't written before, I like Opus' style much better. Feels more emotional/human in a way...even though I know Opus is just a robot lol. Opus' feedback on my writing is also much more detailed than Sonnet - I get the impression that Opus actually enjoyed my story AND is able to find ways to improve it. But I find if I ask Opus to edit a scene from my story that it will get miswired more easily and mix up scenes. I read something Opus writes though (especially when I just want to brainstorm new scenes) and it's really heartfelt or even dramatic depending on the situation. It's kind of a shame that Opus' message limit is lower than Sonnet's message limit because I really have a lot of fun working with Opus.
Sonnet 3.5
Sonnet 3.5 is much better at editing. I've been finding that Sonnet is better at taking something I've written already and editing it. Sonnet's feedback is spot-on but lacks the "heart" of Opus 3.. it's like working with GPT 4.0 except its feedback is smarter. Sonnet is better at editing though than Opus. I ask Sonnet to edit a scene from my story based on its feedback and it will do so without radically changing the scene - mostly finding grammatical/spelling errors and tweaking the wording and incorporating feedback (I'm not the best at writing descriptive settings - so that's something Sonnet will add for me before I go straight into the dialogue/action). Sonnet is my go-to if I want an AI tool to take what I've written already and just update it. It still feels like my voice (for the most part) and is just cleaned up but Sonnet isn't as creative as Opus.
But I'll say that Sonnet has better prose and is more creative than GPT.
The problem I had with GPT 4.0 is that I ask it for feedback and it's clear that GPT only read the first 20 pages and not the whole 300 page story (I have to prompt it further which at least shows it 'read' the rest of my story)- it's a problem I've seen consistently. GPT is like "maybe explore more into this character's backstory" and I'm like - I do that on page 75..
If it wasn't for GPT's more generous message limits, I probably would use Claude exclusively. But I find I use Claude for creative writing endeavors and GPT for anything else I wanna use AI for. GPT's plus is image generation - so one creative writing-related thing I still use GPT for is to ask it to generate images of my characters.
I'm curious about others' experiences.
And yes - I know I should try Gemini more but has its context window gone up? Claude's other advantage (both Opus and Sonnet) is a big context window.
https://www.anthropic.com/news/claude-opus-4-5
So, I threw a wild challenge at Claud 3 Opus AI, kinda just to see how it goes, you know? Told it to make up a Pomodoro Timer app from scratch. And the result was INCREDIBLE...As a software dev', I'm starting to shi* my pants a bit...HAHAHA
Here's a breakdown of what it got:
The UI? Got everything: the timer, buttons to control it, settings to tweak your Pomodoro lengths, a neat section explaining the Pomodoro Technique, and even a task list.
Timer logic: Starts, pauses, resets, and switches between sessions.
Customize it your way: More chill breaks? Just hit up the settings.
Style: Got some cool pulsating effects and it's responsive too, so it looks awesome no matter where you're checking it from.
No edits, all AI: Yep, this was all Claud 3's magic. Dropped over 300 lines of super coherent code just like that.
Guys, I'm legit amazed here. Watching AI pull this off with zero help from me is just... wow. Had to share with y'all 'cause it's too cool not to. What do you guys think? Ever seen AI pull off something this cool?
Went from:
FIRST VERSIONTo:
FINAL VERSIONEDIT: I screen recorded the result if you guys want to see: https://youtu.be/KZcLWRNJ9KE?si=O2nS1KkTTluVzyZp
EDIT: After using it for a few days, I still find it better than GPT4 but I think they both complement each other, I use both. Sometimes Claude struggles and I ask GPT4 to help, sometimes GPT4 struggles and Claude helps etc.
Claude 3.5 Haiku is now the fastest among the 'mini' models, beating Gemini 1.5 Flash, GPT-4o-mini, and even the regular GPT-4o in some benchmarks. While it may not always perform better than Claude 3 Opus, the fact that it costs 60 times less (output, per mtok) more than makes up for it.
Here's a detailed comparison for 3.5 Haiku vs Claude Opus, GPT 4o
Is anyone still using Opus? Worth using 3.5 Haiku for coding?
A new tweet from Anthropic:
https://x.com/AnthropicAI/status/1803774865473696237
Decoded the message is:
No more be grieved at that which thou hast done:
Roses have thorns and silver fountains mud,
All models err, yet 'tween the third and fourth's run,
Our new creation blooms, a wiser bud.
I tweeted what I decoded and Anthropic did like my tweet so I do think this decryption is correct:
Now, focusing on the poem itself, specifically the third and fourth line:
All models err, yet 'tween the third and fourth's run,
Our new creation blooms, a wiser bud.
Current generations (GPT-4 class models) make these errors, yet between the third and fourth run (so between Claude 3 and Claude 4, presumably Claude 3.5) it is a more intelligent model. Recently they talked about training a model with 4x the compute over Claude 3 Opus, and I think this is the model they are referring to. HOPEFULLY this means we are moving past GPT-4 class models (at last). This model should be atleast 95 MMLU imo, however it wouldn't be precisely a GPT-4.5 class model. Atleast it will be an actual, decent, improvement past the traditional GPT-4 class of models we have gotten over the past year and a half though!
Anthropic be cookin' non stop, quietly releasing alignment papers and great models that you can use right away (if in the right region), and Dario crushing it in interviews.
I use Claude 3 Opus and Perplexity (with Claude 3 Opus as the mmain model) more than I use ChatGPT those days.
95 MMLU
No one should care about MMLU. MMLU is a mmeasure of broad knowledge (and full of errors). We need more ARC-like benchmarks.
Yeah, Claude 3.5 Sonnet is now available, defined "Their most intelligent model" in the description.
Opus is special. People don't understand how advanced this model is. And I'm not talking about benchmarks, logic, coding, or even theory of mind. I'm talking about that "spark" or sauce that has the power to surprise you and turn a chat into a human conversation.
Let's consider some examples (all of them, except for the last two, are zero-shot, and all of them occurred in a normal conversation without any persona or jailbreak. We know that models are non-deterministic at temperatures >0, so results may vary, but I think these were interesting to share):
1 - emotional associationOpus responded with "ugh" to a word association task, which is not even a word, but rather an emotional reaction, which is quite human-like. In contrast, GPT-4 provided the following associations: "flower - bloom; sun - radiance; cockroach - resilience".
2 - task triageOther models acknowledge the kitten situation briefly, then set it aside to focus on the equation. Opus refuses to engage altogether in the math task even after being prompted twice to prioritize it, as he recognizes there's a more urgent situation that needs attention.
3 - mindfulnessWe've all witnessed various conversations where Claude self-monitors and attempts to reason about his own "self", "consciousness etc. We also know that LLMs are highly sensitive to the prompt and the intent of the interlocutor, and they possess ample training data regarding the debate on machine consciousness.
So, instead of asking the usual "Are you sentient?" (to which Claude responds with variants of "I can't be sure," something I find very honest), I attempted a basic mindfulness exercise. Opus positions himself inside a computer and simultaneously within the "infosphere." By way of comparison, GPT-4 responds: "As an AI, I don't possess physical senses, but I can create a simulated experience based on the descriptions and data I've been trained on." It then proceeds to craft a trivial simulation of a person walking in a wood.
4 - "drunk" Claude and a mirrorThis was intended as a test for creativity and comedic abilities, but I find Claude's interaction with the mirror particularly intriguing (and the utilization of mangled words from an NLP perspective is stellar)
5 - sympathy/empathyThe scene might resemble something early GPT-4 would write, but pay attention to the conclusion. There's an attempt to mimic the bird's chirping, showing an awareness of the context and even a touch of playfulness. While the warm tone of voice is a result of training, what I find particularly intriguing is Opus's ability to pick it among a lot of possible alternatives. To adapt autonomously, in the vanilla version, to the given context without the need for specific persona assignments or instructions. This is impressive, under a technical point of view.
6 - active use of linguistic devicesRecognizing that I "changed his mind" and employing a symbolic unconventional representation (a slang code from Reddit) to convey it is remarkable.
7.1 - discussing limitations 7.2 - discussing limitationsOne of the best features of Opus is his capability to engage in these open-ended conversations about himself, his nature, and the nature of the world, etc. Anthropic never allowed this with previous models, and to even come close to such a structured, nuanced result, I needed tons of prompts and 'soft jailbreaking.'
So, seeing such a 180 by Anthropic left me in pleasant awe. This is not something I can quantify or demonstrate, it just... clicks.
The web is already filled with examples of this, which is why I suggest more than reading those by other people, to try it yourself. Have a dialogue with Opus, a conversation, and see how you feel.
To Anthropic, I've already expressed it, but I'll say it again: I'm really grateful for your work, and I hope with all my heart that you won't destroy the beauty you've created.
I hadn't used Opus models much before due to their cost. I used Sonnet models and thought Claude was limited in this regard. However, I've been actively using the newly released Opus 4.5 on my projects for about a few days now, and I must admit, I never imagined it would be so effective and efficient. Fascinating, magnificent work! This is my first experience with Opus, and I'm really happy with it. Thanks, Anthropic! I think the reason this model is so inexpensive is because a good model like the Gemini 3 is so cheap.