Local AI Models for Coding: How to Run Ollama and Keep Your Code Private

Running AI models locally means your code never leaves your machine. Here is how to set it up with Ollama.

Why Run Locally

Complete privacy — no API keys, no data transmission, no usage logs. No ongoing cost after hardware. Works fully offline. Possibility to fine-tune on your own codebase. For developers at companies with strict data policies, this is often the only compliant option.

Hardware Requirements

Modern GPU (8+ GB VRAM recommended for quality models). Apple Silicon Macs work well thanks to Metal acceleration — an M2 Pro or M3 with 16GB unified memory runs 7B-13B parameter models smoothly. CPU-only is possible but slow. 32GB RAM or more recommended for the larger models.

Installing and Running Ollama

Install via Homebrew: brew install ollama. Start the server: ollama serve. Pull a coding model: ollama pull deepseek-coder or ollama pull codellama. Run in terminal: ollama run deepseek-coder. The model downloads once and runs locally from then on.

Connecting Ollama to Your IDE

Continue.dev is the primary tool for this. Install the Continue.dev extension in VS Code, configure it to use your local Ollama endpoint (http://localhost:11434), and select your model. It provides inline completions and chat just like cloud tools.

Model Quality Comparison

DeepSeek Coder 6.7B is impressive for its size — competitive with earlier Copilot generations on Python and JavaScript. CodeLlama 13B offers better quality at the cost of slower inference. Locally-run models are roughly 30-50% less capable than current cloud models, but the quality gap is narrowing with each new model release.

Realistic Expectations

Do not expect cloud-model quality. Local models excel at autocomplete and short snippets. They struggle more with complex multi-file reasoning. The privacy and cost benefits are real — calibrate your expectations accordingly.

Local AI Models for Coding: How to Run Ollama and Keep Your Code Private