Skip to content

AI Fundamentals

Inference

The process of running a trained model to generate predictions or completions on new input. Inference speed—measured in tokens per second—directly affects how responsive an AI coding tool feels during use.