Local AI

GPU VRAM Requirements

The amount of video RAM needed to load a model's weights into a GPU for fast inference. As a rule of thumb, an unquantized model needs roughly 2 bytes of VRAM per parameter, so a 7B model requires ~14 GB at FP16.

← Full glossary