Videos
What are the main features of Claude Code Router?
How do I install and start using Claude Code Router?
Who provides updates for the Claude Code models used by the router?
https://github.com/musistudio/claude-code-router
You really notice though that Claude 4 was trained within the harness of CC but other models with strong instructions following do decent too.
I tested gemini, gpt4.1, kimi k2 and I liked 4.1 best actually but ymmv.
» npm install @tellerlin/claude-code-router
» npm install @rikaaa0928/claude-code-router
I just hit CC limits twice in a row, so decided to finally try out CCR with Gemini 2.5 Pro and Qwen coder. So far, it has been a disaster. Did anyone have any real success with it so far? Any tips you can share?
TL;DR: I wired up claude-code with claude-code-router (ccr) and vLLM running Qwen/Qwen3-Coder-30B-A3B-Instruct. Chat works, but inside Claude Code it never executes anything (no tool calls), so it just says “Let me check files…” and stalls. Anyone got this combo working?
Setup
Host: Linux
Serving model (vLLM):
python -m vllm.entrypoints.openai.api_server \ --host 0.0.0.0 --port 8000 \ --model Qwen/Qwen3-Coder-30B-A3B-Instruct \ --dtype bfloat16 --enforce-eager \ --gpu-memory-utilization 0.95 \ --api-key sk-sksksksk \ --max-model-len 180000 \ --enable-auto-tool-choice \ --tool-call-parser hermes \ --tensor_parallel_size 2
I can hit this endpoint directly and get normal chat responses without issues.
claude-code-router config.json**:**
jsonCopyEdit{
"LOG": true,
"CLAUDE_PATH": "",
"HOST": "127.0.0.1",
"PORT": 3456,
"APIKEY": "",
"API_TIMEOUT_MS": "600000",
"PROXY_URL": "",
"transformers": [],
"Providers": [
{
"name": "runpod",
"api_base_url": "https://myhost/v1/chat/completions",
"api_key": "sk-sksksksksk",
"models": ["Qwen/Qwen3-Coder-30B-A3B-Instruct"]
}
],
"Router": {
"default": "runpod,Qwen/Qwen3-Coder-30B-A3B-Instruct",
"background": "",
"think": "",
"longContext": "",
"longContextThreshold": 60000,
"webSearch": ""
}
}Client: ccr code
On launch, Claude Code connects to http://127.0.0.1:3456, starts fine, runs /init, and says:
…but then it never actually runs anything (no bash/dir/tool calls happen).
What works vs. what doesn’t
✅ Direct requests to vLLM
chat/completionsreturn normal assistant messages.✅ Claude Code UI starts up, reads the repo, and “thinks”.
❌ It never triggers any tool calls (no file ops, no bash, no git, no nothing), so it just stalls at the “checking files” step.
Things I’ve tried
Drop the Hermes parser: remove
--enable-auto-tool-choiceand--tool-call-parser hermesfrom vLLM so we only use OpenAI tool calling. But it won't answer any request and throws an error.
Questions:
Has anyone run Claude Code → ccr → vLLM successfully with Qwen3 Coder 30B A3B? If yes, what exact vLLM flags (especially around tool calling) and chat template did you use?
Should I avoid
--tool-call-parser hermeswith Qwen? Is there a known parser that works better with Qwen3 for OpenAI tools?ccr tips: Any ccr flags/env to force tool_choice or to log the raw upstream responses so I can confirm whether
tool_callsare present/missing?
Logs / snippet
From Claude Code:
shellCopyEdit... Welcome to Claude Code ... > /init is analyzing your codebase… > ok > Let me first check what files and directories we have... # (stalls here; no tool execution happens)
If you’ve got this stack working, I’d love to see your vLLM command, ccr config, and (ideally) a single tool-call response as proof-of-life. Thanks!
Which model has the best tool calling with Claude code router?
Been experimenting with claude code router seen seen here: https://github.com/musistudio/claude-code-router
I got Kimi-K2 to work with Groq, but the tool calling seems to cause issues.
Is anyone else having luck with Kimi-k2 or any other models for claude code router (which is of course quite reliant on tool calling). Ive tried trouble shooting it quite abit but wondering if this is a config issue.