So I have had FOMO on claudecode, but I refuse to give them my prompts or pay $100-$200 a month. So 2 days ago, I saw that moonshot provides an anthropic API to kimi k2 so folks could use it with claude code. Well, many folks are already doing that with local. So if you don't know, now you know. This is how I did it in Linux, should be easy to replicate in OSX or Windows with WSL.
Start your local LLM API
Install claude code
install a proxy - https://github.com/1rgs/claude-code-proxy
Edit the server.py proxy and point it to your OpenAI endpoint, could be llama.cpp, ollama, vllm, whatever you are running.
Add the line above load_dotenv
+litellm.api_base = "http://yokujin:8083/v1" # use your localhost name/IP/ports
Start the proxy according to the docs which will run it in localhost:8082
export ANTHROPIC_BASE_URL=http://localhost:8082
export ANTHROPIC_AUTH_TOKEN="sk-localkey"
run claude code
I just created my first code then decided to post this. I'm running the latest mistral-small-24b on that host. I'm going to be driving it with various models, gemma3-27b, qwen3-32b/235b, deepseekv3 etc
Videos
✨ Feature Request: Support for Self-Hosted LLMs in Claude Code Harness
Anyone been using local LLMs with Claude Code?
Is there some local LLM at the level of Claude.ai?
Show HN: Fork of Claude-code working with local and other LLM providers
Looking for feedback/experience in using Qwen3-Coder:a3b, gpt-oss-120b or GLM 4.5 air with Claude Code locally.
Hello, while researching the topic of content creation automation with LLMs, I stumbled upon this video https://www.youtube.com/watch?v=Qpgz1-Gjl_I
What caught my interest are the incredible capabilities of Claude.ai. I mean it is capable of creating HTML documents. I did the same with a local LLaMa 7b instruct, so no biggie. Where things start to go awry with LLaMa is when I ask for the infographic using SVG icons and even more for the interactive timeline. There is no way LLaMa creates a JS script, you must ask very persistently and even then the script simply doesn't work.
Also it was fun to see LLaMa write all the document in HTML but adding a reference section written in markdown. I pointed it out to the model and it said it was sorry, then corrected the mistake and transformed the markdown in HTML. I wonder why it made such a mistake.
However it looks like Claude.ai is capable of much more complex reasoning.
At this point I wonder if it is because Claude is a tens of billions parameters model, while the LLaMa I am using is just a 7b one. Or if there are fundamental differences at the level of architecture and training. Or maybe the 200k token context window plays a role? I am running LLaMa through Ollama, so I am using moderate settings.
I have even tried a couple of LLaMa derived models with similar results. I played with CodeQwen and it shows it isn't made to write articles.
So, anyone knowledgeable and with a bit of experience in using the various LLMs could help me find the needle in this haystack?
p.s. I wonder if all the various opensource LLMs out there are based on LLaMa, or if there are non LLaMa ones too!