Local AI

CPU Inference

Running LLM inference on a CPU rather than a GPU, typically 5–20x slower but accessible without specialized hardware. Tools like llama.cpp make CPU inference practical for smaller models on everyday laptops.

← Full glossary