Local AI
GPU VRAM Requirements
The amount of video RAM needed to load a model's weights into a GPU for fast inference. As a rule of thumb, an unquantized model needs roughly 2 bytes of VRAM per parameter, so a 7B model requires ~14 GB at FP16.
Local AI
The amount of video RAM needed to load a model's weights into a GPU for fast inference. As a rule of thumb, an unquantized model needs roughly 2 bytes of VRAM per parameter, so a 7B model requires ~14 GB at FP16.