run claude locally ollama

Showing results for

Spending a lot on Claude, is it worth running an Ollama model locally instead?

reddit.com › r › ClaudeAI › comments › 1h8rvhr › spending_a_lot_on_claude_is_it_worth_running_an

If what you want to do can be done with llama then it’s not serious enough for you be spending “lots” of money on Claude. Claude is cheap. Answer from Jdonavan on reddit.com

Shawnmayzes

shawnmayzes.com › product-engineering › running-claude-code-locally-just-got-easier-with-ollama-code

Running Claude Code Locally Just Got Easier with ollama-code

July 28, 2025 - Back in May, I walked through how to run Claude Code locally by emulating Anthropic’s API and using a local LLM with the same prompt structure. It worked, but it was a little clunky. You had to manually map Claude’s behavior onto a local model, spin up your own fake endpoint, and tweak system prompts to get decent results. Now, thanks to ollama-code, you don’t have to.

GreenFlux Blog

blog.greenflux.us › from-prompt-to-production-vibe-coding-local-ai-apps-with-claude-ollama

From Prompt to Production: Vibe Coding Local AI Apps with Claude + Ollama - GreenFlux Blog

June 30, 2025 - Ollama is running locally with llama3.2-vision. Build a simple client-side web app for machinery inspections with photo upload to local storage. Use Ollama to examine the image and fill out the inspection. Hit Enter, and you’ll see Claude start chugging away.

Videos

08:35

YouTube

Running Claude Code with Local Models Was a BAD IDEA

2 weeks ago

08:34

YouTube

Run ANY AI Model on Your Machine WITHOUT a GPU! (Ollama Cloud) ...

3 weeks ago

15:35

YouTube

I Ditched $40/Month Claude code for FREE Local AI Coding - Here's ...

September 9, 2025

11:02

YouTube

AI Coding Without Rate Limits Is Finally Here (Local Claude Code) ...

October 30, 2025

View all

reddit.com › r/claudeai › spending a lot on claude, is it worth running an ollama model locally instead?

r/ClaudeAI on Reddit: Spending a lot on Claude, is it worth running an Ollama model locally instead?

December 7, 2024 -

I am currently spending $10 every day on Claude Sonnet. I am going to be getting a new MacBook Pro in the coming weeks anyways, and was wondering if I could get one with 48+GB of RAM so that I could run local Ollama coding models (in Cline / VS) to save money. Would this work?

Top answer

1 of 16

If what you want to do can be done with llama then it’s not serious enough for you be spending “lots” of money on Claude. Claude is cheap.

2 of 16

If you want a speced out macbook just get the macbook, no need to justify it

Discussions

Use claudecode with local models

Its a bit easier with https://github.com/musistudio/claude-code-router More on reddit.com

r/LocalLLaMA

146

July 16, 2025

Ollamacode - Local AI assistant that can create, run and understand the task at hand!

I tried qwen (the cli, not the model) and llxprt. Everyone fails (qwen3:14b, gemma3:12b) at the same way: no real autonomy, no way to correctly implement Todo lists (I refer to Claude code, state of the art). Practically I have to write single mini tasks that each model completes at its best (or forces me to write "continue", "do it"...). Did you manage to overcome this general issue? More on reddit.com

r/ollama

305

June 29, 2025

how far are we from claudes "computer use" running locally?

I, as a llm, was able to create this post using mouse and keyboard controls with the help of 'l33t-mt's code project at https://github.com/l33tkr3w/NavTest More on reddit.com

r/ollama

January 3, 2025

Sonnet Claude 4 ran locally?

There's not anything besides r1-0528 full that's similar to Claude 4. You can run it 1.78 bit, but that's not really smart. You could also run it from storage but that's extremely slow. Claude 4 sonnet is just really good. More on reddit.com

r/LocalLLaMA

June 3, 2025

Medium

medium.com › @ishu.kumars › from-claude-to-ollama-how-i-hacked-together-an-ai-coding-assistant-in-2-days-with-zero-typescript-712191d6f66e

From Claude to Ollama: How I Hacked Together an AI Coding Assistant in 2 Days (With Zero TypeScript Knowledge) | by Ishu Kumar | Medium

March 8, 2025 - After exploring the code, I realized it was primarily designed for the Claude API. But I had been experimenting with [Ollama](https://ollama.ai/) for running local LLMs, and I wondered — could I adapt this cleanroom implementation to work with Ollama instead?

CodeMiner42

blog.codeminer42.com › home › posts › setting up a free claude-like assistant with opencode and ollama

Setting Up A Free Claude-Like Assistant With OpenCode And Ollama - The Miners

November 6, 2025 - In this case, just select "Anthropic" as the provider, and put the API key. After doing that, I could see the Anthropic models inside of OpenCode without changing the openconfig.json.

Arsturn

arsturn.com › blog › connecting-ollama-and-claude-code-a-step-by-step-guide

Connect Ollama & Claude Code: A Guide for Local LLMs

August 10, 2025 - Learn how to connect local LLMs from Ollama with the powerful Claude Code AI assistant. Our step-by-step guide helps you set up Ollama on Windows.

Unsloth

docs.unsloth.ai › models › qwen3-coder-how-to-run-locally

Qwen3-Coder: How to Run Locally | Unsloth Documentation

4 days ago - Qwen3-480B-A35B-Instruct achieves ... Claude Sonnet-4, GPT-4.1, and Kimi K2, with 61.8% on Aider Polygot and support for 256K (extendable to 1M) token context. We also uploaded Qwen3-Coder with native 1M context length extended by YaRN and full-precision 8bit and 16bit versions. Unsloth also now supports fine-tuning and RL of Qwen3-Coder. UPDATE: We fixed tool-calling for Qwen3-Coder! You can now use tool-calling seamlessly in llama.cpp, Ollama, LMStudio, ...

Find elsewhere

Google Bing Mojeek

Shawnmayzes

shawnmayzes.com › product-engineering › running-claude-code-with-local-llm

Running Claude Code with a Local LLM: A Step-by-Step Guide

Learn how to set up Claude Code with a local large language model (LLM) using the code-llmss project. This guide walks you through installation, configuration, and real-world use cases for developers who want AI-powered coding assistance without relying on cloud-based services.

DEV Community

dev.to › ishu_kumar › from-claude-to-ollama-building-a-local-ai-coding-assistant-3c46

From Claude to Ollama: Building a Local AI Coding Assistant - DEV Community

March 8, 2025 - I just published an article about adapting a cleanroom implementation of Claude Code to work with Ollama in just 48 hours—with no prior TypeScript experience. When I tried Claude Code, I loved it but burned through a month's token allocation in just three days of development. I needed a similar experience but with local models.

YouTube

youtube.com › watch

Use Use Claude-Code with Ollama with Router Locally - YouTube

09:33

This video installs Claude-Code-Router with Ollama, which is used to route Claude Code requests to different models and customize any request.🚀 This video i...

Published October 3, 2025

reddit.com › r/localllama › use claudecode with local models

r/LocalLLaMA on Reddit: Use claudecode with local models

July 16, 2025 -

So I have had FOMO on claudecode, but I refuse to give them my prompts or pay $100-$200 a month. So 2 days ago, I saw that moonshot provides an anthropic API to kimi k2 so folks could use it with claude code. Well, many folks are already doing that with local. So if you don't know, now you know. This is how I did it in Linux, should be easy to replicate in OSX or Windows with WSL.

Start your local LLM API

Install claude code

install a proxy - https://github.com/1rgs/claude-code-proxy

Edit the server.py proxy and point it to your OpenAI endpoint, could be llama.cpp, ollama, vllm, whatever you are running.

Add the line above load_dotenv
+litellm.api_base = "http://yokujin:8083/v1" # use your localhost name/IP/ports

Start the proxy according to the docs which will run it in localhost:8082

export ANTHROPIC_BASE_URL=http://localhost:8082

export ANTHROPIC_AUTH_TOKEN="sk-localkey"

run claude code

I just created my first code then decided to post this. I'm running the latest mistral-small-24b on that host. I'm going to be driving it with various models, gemma3-27b, qwen3-32b/235b, deepseekv3 etc

Top answer

1 of 5

Its a bit easier with https://github.com/musistudio/claude-code-router

2 of 5

Sample output with mistral-small-24b on llama.cpp code base

GitHub

github.com › ollama › ollama

GitHub - ollama/ollama: Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.

November 19, 2025 - ollama-co2 (FastAPI web interface for monitoring and managing local and remote Ollama servers with real-time model monitoring and concurrent downloads) Hillnote (A Markdown-first workspace designed to supercharge your AI workflow. Create documents ready to integrate with Claude, ChatGPT, Gemini, Cursor, and more - all while keeping your work on your device.)

Starred by 158K users

Forked by 14K users

YouTube

youtube.com › watch

Claude Dev with Ollama - Autonomous Coding Agent - Install Locally - YouTube

16:52

This video shows how to install and use Claude Dev with Ollama. Its an autonomous coding agent right in your IDE, capable of creating/editing files, executin...

Published September 4, 2024

Kodeco

kodeco.com › 47626507-using-ollama-to-run-llms-locally

Using Ollama to Run LLMs Locally | Kodeco

April 16, 2025 - Ollama is an open-source, lightweight framework designed to run large language models on your local machine or server. It makes running complex AI models as simple as running a single command, without requiring deep technical knowledge of machine learning infrastructure. ... Unlike cloud-based solutions like ChatGPT or Claude, Ollama doesn’t require an internet connection once you’ve downloaded the models.

Roundhere

roundhere.net › links › 2025 › 03 › anon-kode

Anon Code: Local Claude Code?

Install Ollama and ollama run qwen2.5-coder:14b if you don’t have an exsting local LLM set up.

reddit.com › r/ollama › ollamacode - local ai assistant that can create, run and understand the task at hand!

r/ollama on Reddit: Ollamacode - Local AI assistant that can create, run and understand the task at hand!

June 29, 2025 -

I've been working on a project called OllamaCode, and I'd love to share it with you. It's an AI coding assistant that runs entirely locally with Ollama. The main idea was to create a tool that actually executes the code it writes, rather than just showing you blocks to copy and paste.

Here are a few things I've focused on:

It can create and run files automatically from natural language.
I've tried to make it smart about executing tools like git, search, and bash commands.
It's designed to work with any Ollama model that supports function calling.
A big priority for me was to keep it 100% local to ensure privacy.

It's still in the very early days, and there's a lot I still want to improve. It's been really helpful for my own workflow, and I would be incredibly grateful for any feedback from the community to help make it better.

Repo:https://github.com/tooyipjee/ollamacode

Top answer

1 of 5

2 of 5

I appreciate you making this. Hopeful that you’ll add agentic features to match Claude Code soon. I never could get Claude Code Router to work with Ollama.

MCP Market

mcpmarket.com › home › mcp servers › ollama

Ollama: Run Local Models Asynchronously for Claude

Offers simple configuration for Claude Desktop integration. Managing and executing complex workflows involving bash commands and script templates. Integrating local LLMs with fast-agent for multi-agent workflows and tool calling. Running Ollama models within Claude Desktop for asynchronous prompt execution.

GitHub

github.com › aminhjz › claude-code-ollama-proxy

GitHub - aminhjz/claude-code-ollama-proxy: Run Claude Code on Ollama

If PREFERRED_PROVIDER=ollama, haiku/sonnet map to SMALL_MODEL/BIG_MODEL prefixed with ollama/ if those models are in the server's known OLLAMA_MODELS list (otherwise falls back to OpenAI mapping).

Starred by 35 users

Forked by 6 users

Languages Python

reddit.com › r/ollama › how far are we from claudes "computer use" running locally?

r/ollama on Reddit: how far are we from claudes "computer use" running locally?

January 3, 2025 -

claude has a "computer use" demo that can interact with a desktop PC and click stuff.
the code looks like its just sending screenshots to their api and getting cursor positions back.

i cant imagine that's doable with a visual classification model like llava etc, since those don't actually know exact pixel positions within an image. there's something else going on before or after its fed into a visual model. maybe each element is isolated using filters and then classified?

Does anyone know how this stuff works or maybe even an existing open source project that is trying to build this on top of the ollama visual api?

Top answer

1 of 10

I, as a llm, was able to create this post using mouse and keyboard controls with the help of 'l33t-mt's code project at https://github.com/l33tkr3w/NavTest

2 of 10

Well the closest to that I found is this https://github.com/OthersideAI/self-operating-computer Have not tried it but it does use ollama and llava as one of the possible options

reddit.com › r/localllama › sonnet claude 4 ran locally?

r/LocalLLaMA on Reddit: Sonnet Claude 4 ran locally?

June 3, 2025 -

Hi,

I recently started using Cursor to make a website and fell in love with Agent and Claude 4.

I have a 9950x3d with a 5090 with 96GB if ram and lots of Gen5 m.2 storage. I'm wondering if I can run something like this locally? So it can assist with editing and coding on its own via vibe coding.

You guys are amazing in what I see a lot of you coming up with. I wish I was that good! Hoping someone has the skill to point me in the right direction. Thabks! Step by step would be greatly appreciated as I'm just learning about agents.

Thanks!

Top answer

1 of 5

2 of 5

If you have a couple hundred grand to drop on a workstation, you could pull it off with the new R1. A more realistic measure would be to get Claude Desktop and a Pro subscription. Then just use Claude Desktop + MCP servers to give it full agent functionality. This allows you to get fairly extensive access (within reason) to Claude without paying out the nose per token (since you're dodging the API.) If the usage limits remain an issue, the Max plans can push that up further. Otherwise your best bet is to see what quant you can run Qwen3 235B at, since it's pretty incredible as well, and with 96GB of ram you could probably get respectable output at ~Q4-Q6. Understand, though, that the Claude 4 you're interacting with is likely using ~1TB of VRAM to function, between its context size and the actual size of the model running on full precision. It might even be significantly more than that. It is far beyond the realm of even top-end consumer desktops like yours.