Update: This is now out of date. Please use a dedicated cost calculator like llmpricecheck.com or https://docsbot.ai/tools/gpt-openai-api-pricing-calculator
I compiled a breakdown of cost/performance in a google sheet and there were a couple of things that struck me:
Mistral-medium is really impressive and sits perfectly sandwiched between GPT-3.5 and GPT-4. In my (limited) experience it's a great choice for anyone that isn't able to get consistency or quality out of GPT-3.5.
Why would anyone choose Claude? No but seriously, what are their competitive advantage? Safety?
GPT-3.5 v Gemini Pro seems very close but in my (limited) experience GPT-3.5 does perform better in practice. I'd be curious what other peoples experiences are.
Google sheet is here, please let me know if I bungled any of the numbers.
All prices are normalized to USD/1M tokens. (EUR -> USD conversion value as of today)
I only used the LMSYS chatbot arena benchmark because all the rest have seemingly been gamified already.
The sheet includes an example estimate calculator, just copy the sheet to override the values try it.
Edit: Added Mistral-7B-OpenOrca, Mixtral-8x7B-Instruct-v0.1 and Llama-2-70b-chat-hf running on Anyscale.