As the AI industry enters the maturity phase, traditional terms have become outdated, which is why NVIDIA suggests that the new ways to think about AI TCO should be evaluated based on "Cost Per Token".
NVIDIA Wants Everyone To Rethink AI TCO With "Cost Per Tokens" Metric
Tokens are the single most important metric for AI. While yesterday's data centers were evaluated on their raw computing power, today's AI factories are evaluated on their token output. But it's not important for who does the most tokens, efficiency and cost are still the values that matter the most. That is why how AI factories think about TCO needs to change.
NVIDIA emphasizes that enterprises still use relative numbers, chip specifications, compute cost, FLOPS/$, and that needs to change.
- Compute cost is what enterprises pay for AI infrastructure, whether rented from cloud providers or owned on premises.
- FLOPS per dollar is how much raw computing power an enterprise gets for every dollar spent, but raw compute and real-world token output are not the same thing.
- Cost per token is an enterprise’s all-in cost to produce each delivered token, usually represented as cost per million tokens.
NVIDIA explains some of the factors that can lower token cost. They use an equation for calculating the cost per million tokens. The company cites that most AI enterprises only focus on the numerator, which is Cost Per GPU per Hour, but that's only the tip of the iceberg. The denominator of the equation is what actually helps minimize token costs and maximize revenue.
- Minimize token cost: When this increase in token output is reflected through the cost equation, it drives down cost per token, which is what grows the profit margin on every interaction served.
- Maximize revenue: More tokens delivered per second also translates to more tokens per megawatt, which means more intelligence to use in AI-powered products and services, generating more revenue from the same infrastructure investment.
And why does all of this matter? The answer is very simple, because for AI enterprises, it should be the cost per token that matters, not the FLOPS per dollar.
For this, NVIDIA showcases an example between its Hopper and Blackwell GPUs. The cost of operating Hopper GPUs is way lower than Hopper, around 2x lower, and the total FLOPS per dollar also shows just a 2x difference. So, just going by these two metrics, Blackwell doesn't look like much of a difference since it costs 2x more, and that offsets its 2x performance difference versus the previous generation.
The actual difference lies in the tokens throughput and the cost per million tokens. In these variables, Blackwell is up to 65x better than Hopper, and the cost per million tokens is 35 times lower on Blackwell versus Hopper. For reference, the data was evaluated on SemiAnalysis's InferenceX v2 benchmark.
| Metric | NVIDIA Hopper (HGX H200) | NVIDIA Blackwell (GB300 NVL72) | NVIDIA Blackwell Relative to Hopper |
|---|---|---|---|
| Cost per GPU per Hour ($) | $1.41 | $2.65 | 2x |
| FLOP per Dollar (PFLOPS) | 2.8 | 5.6 | 2x |
| Tokens per Second per GPU | 90 | 6,000 | 65x |
| Tokens per Second per MW | 54K | 2.8M | 50x |
| Cost per Million Tokens ($) | $4.20 | $0.12 | 35x lower |
Now you can treat all of this as NVIDIA's iconic "CEO Math," but there is some actual reasoning behind why these numbers matter. You see, NVIDIA has a very powerful suite of software stacks for AI, and has been leading the charts across every benchmark where others aren't even close.
NVIDIA's CEO has even challenged other firms to benchmark their own chips since many often claim that they are ahead of NVIDIA, but there's just no proof out there.
"Nobody can demonstrate to me that any single platform in the world today has better performance TCO ratio. Not one company... I encourage them to use inference max and demonstrate their incredible inference cost. It's really really hard.. no nobody wants to show up."
Jensen Huang - NVIDIA CEO
With this rethinking of AI TCO and AI in general, NVIDIA isn't just claiming a victory in benchmarks; they are also claiming that they have the throne in metrics that matter to AI enterprises.
Follow Wccftech on Google to get more of our news coverage in your feeds.
