GPT-5-mini is very slow compared to GPT-4.1-mini. What is the upgrade path?
There is no way to use gpt-5 through API without giving away biometric information.
Why is NOBODY talking about just how amazing GPT-5-mini is??
GPT-4.5 has an API price of $75/1M input and $150/1M output. ChatGPT Plus users are going to get 5 queries per month with this level of pricing.
Videos
Hey all, excuse me if I am being dense here, but I am genuinely confused about what is the obvious upgrade path for gpt-4.1-mini now that the GPT-5 family is out. I'll just say at the outset that the caveat to this whole post is that I am not evaluating answer quality at all, just the latency.
I run a small, free service that returns AI answers to Canadian tax questions: TaxGPT.ca. It's a little hobby project for me, and it gets a reasonable amount of usage.
I use gpt-4.1-mini to answer most questions. I have a RAG pipeline, which means I feed it documents from our tax agency. Users will ask questions, I augment their questions with some tax info, and then I send all of it to OpenAI and get a response. When I hit my API directly, it returns answers in around ~8 seconds. Not exactly lighting fast, but it seems like a nice balance of accuracy vs speed.
Now that GPT-5* is out, I figured the obvious upgrade would be to switch from gpt-4.1-mini to gpt-5-mini. But when trying it out as a drop-in replacement, I am finding the response times are much slower, like ~13 seconds. The answers might be better (hard to tell), but they are definitely slower (easy to tell).
I spun up a little demo app to record response times for different API calls for a simple conversation with a system message and 3 'turns'.
Since my app is currently using the completions API route, the easiest change for me is just to change the model name. But since gpt-5-mini is a reasoning model, if you use the newer API route, you can dial the 'effort' variable up or down. So I recorded response times for both the old and new API routes, including all levels of reasoning effort.
High-level results are this:
Completions API (legacy)
4.1-mini average response time: 1.1 s
5-mini average response time: 7.7 s
Responses API (includes reasoning "effort")
4.1-mini average response time: 1.2 s
5-mini average response time, minimal effort: 2.2 s
5-mini average response time, low effort: 3.8 s
5-mini average response time, medium effort (default): 7.9 s
5-mini average response time, high effort: 25.7 s
It leaves me confused about the intended upgrade path here for people using the last mini model.
It seems crazy that the default call to 5.1-mini it takes almost 7 times longer in this simple example when using the older API (admittedly, the proportional increase is less dramatic in real-world usage, but that's probably to do with the rest of my answering pipeline).
Is the idea that gpt-5.1-mini with "low" or even "minimal" reasoning is the better bet here?
I understand that everything depends on context (ha), so you should tune for your use case, but the most straightforward approach would be to just change the model name, which makes the latency jump by a huge margin.
If you boil it down to Steve Jobs’, "Which ones do I tell my friends to buy?", which one am I supposed to use?
I almost feel like the answer right now is "don't upgrade."
Full results
Each run includes this 4-message conversation.
- system: You have a Canadian accent. Respond in 1 sentence. - user: "what is the population of quebec city - assistant: Quebec City, eh? I think it's about 550000 or so bud. - user: what should i do there?
Completions API route (legacy)
| Model | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | avg time (ms) |
|---|---|---|---|---|---|---|
| 4.1 mini | 974 | 1739 | 823 | 866 | 971 | 1075 |
| 5 mini | 9790 | 7826 | 5917 | 7433 | 7581 | 7709 |
Responses API route (includes reasoning "effort")
| Model | Effort | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | avg time (ms) |
|---|---|---|---|---|---|---|---|
| 4.1 mini | N/A | 1656 | 1351 | 800 | 1196 | 1157 | 1232 |
| 5 mini | minimal | 2439 | 2093 | 2437 | 1910 | 1876 | 2151 |
| 5 mini | low | 4287 | 3066 | 3922 | 3147 | 4746 | 3834 |
| 5 mini | medium | 7083 | 9144 | 6952 | 5245 | 10844 | 7854 |
| 5 mini | high | 20988 | 42812 | 18342 | 23225 | 22931 | 25660 |