Skip to content

Local AI

INT4 Quantization

A quantization level that represents model weights as 4-bit integers, dramatically reducing VRAM and RAM usage. INT4 allows models like Llama 3 70B to run on a single consumer GPU with modest quality trade-offs.