Local AI

Quantization

The process of reducing model weight precision (e.g., from 16-bit floats to 4-bit integers) to shrink memory usage and speed up inference. Quantization makes large models runnable on consumer hardware with acceptable quality loss.

← Full glossary