Performance
Throughput
The total number of tokens a system can process per unit of time across all users or requests. High throughput matters for teams sharing a self-hosted AI service or enterprise deployments with many concurrent developers.
Performance
The total number of tokens a system can process per unit of time across all users or requests. High throughput matters for teams sharing a self-hosted AI service or enterprise deployments with many concurrent developers.