Skip to content

Performance

Throughput

The total number of tokens a system can process per unit of time across all users or requests. High throughput matters for teams sharing a self-hosted AI service or enterprise deployments with many concurrent developers.