Performance

Context Length Limit

The hard upper bound on how many tokens can be in a single model request, including the prompt and the generated output. Exceeding the limit requires truncation or summarization strategies to preserve key context.

← Full glossary