NVIDIA Wants Everyone To Rethink AI TCO, & Explains Why “Cost Per Token” Is The Only Metric That Matters

•

Apr 16, 2026 at 12:00pm EDT

NVIDIA Wants Everyone To Rethink AI TCO, & Explains Why "Cost Per Token" Is The Only Metric That Matters 1

As the AI industry enters the maturity phase, traditional terms have become outdated, which is why NVIDIA suggests that the new ways to think about AI TCO should be evaluated based on "Cost Per Token".

NVIDIA Wants Everyone To Rethink AI TCO With "Cost Per Tokens" Metric

Tokens are the single most important metric for AI. While yesterday's data centers were evaluated on their raw computing power, today's AI factories are evaluated on their token output. But it's not important for who does the most tokens, efficiency and cost are still the values that matter the most. That is why how AI factories think about TCO needs to change.

NVIDIA emphasizes that enterprises still use relative numbers, chip specifications, compute cost, FLOPS/$, and that needs to change.

Compute cost is what enterprises pay for AI infrastructure, whether rented from cloud providers or owned on premises.
FLOPS per dollar is how much raw computing power an enterprise gets for every dollar spent, but raw compute and real-world token output are not the same thing.
Cost per token is an enterprise’s all-in cost to produce each delivered token, usually represented as cost per million tokens.

NVIDIA explains some of the factors that can lower token cost. They use an equation for calculating the cost per million tokens. The company cites that most AI enterprises only focus on the numerator, which is Cost Per GPU per Hour, but that's only the tip of the iceberg. The denominator of the equation is what actually helps minimize token costs and maximize revenue.

Minimize token cost: When this increase in token output is reflected through the cost equation, it drives down cost per token, which is what grows the profit margin on every interaction served.
Maximize revenue: More tokens delivered per second also translates to more tokens per megawatt, which means more intelligence to use in AI-powered products and services, generating more revenue from the same infrastructure investment.

And why does all of this matter? The answer is very simple, because for AI enterprises, it should be the cost per token that matters, not the FLOPS per dollar.

For this, NVIDIA showcases an example between its Hopper and Blackwell GPUs. The cost of operating Hopper GPUs is way lower than Hopper, around 2x lower, and the total FLOPS per dollar also shows just a 2x difference. So, just going by these two metrics, Blackwell doesn't look like much of a difference since it costs 2x more, and that offsets its 2x performance difference versus the previous generation.

The actual difference lies in the tokens throughput and the cost per million tokens. In these variables, Blackwell is up to 65x better than Hopper, and the cost per million tokens is 35 times lower on Blackwell versus Hopper. For reference, the data was evaluated on SemiAnalysis's InferenceX v2 benchmark.

Metric	NVIDIA Hopper (HGX H200)	NVIDIA Blackwell (GB300 NVL72)	NVIDIA Blackwell Relative to Hopper
Cost per GPU per Hour ($)	$1.41	$2.65	2x
FLOP per Dollar (PFLOPS)	2.8	5.6	2x
Tokens per Second per GPU	90	6,000	65x
Tokens per Second per MW	54K	2.8M	50x
Cost per Million Tokens ($)	$4.20	$0.12	35x lower

Now you can treat all of this as NVIDIA's iconic "CEO Math," but there is some actual reasoning behind why these numbers matter. You see, NVIDIA has a very powerful suite of software stacks for AI, and has been leading the charts across every benchmark where others aren't even close.

NVIDIA's CEO has even challenged other firms to benchmark their own chips since many often claim that they are ahead of NVIDIA, but there's just no proof out there.

"Nobody can demonstrate to me that any single platform in the world today has better performance TCO ratio. Not one company... I encourage them to use inference max and demonstrate their incredible inference cost. It's really really hard.. no nobody wants to show up."

Jensen Huang - NVIDIA CEO

With this rethinking of AI TCO and AI in general, NVIDIA isn't just claiming a victory in benchmarks; they are also claiming that they have the throne in metrics that matter to AI enterprises.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

NVIDIA Wants Everyone To Rethink AI TCO, & Explains Why “Cost Per Token” Is The Only Metric That Matters

NVIDIA Wants Everyone To Rethink AI TCO With "Cost Per Tokens" Metric

Related Story Intel Foundry Securing Packaging & Wafer Deal With NVIDIA To Make Next-Gen Feynman GPUs Could Be Its Biggest Customer Win Yet

Further Reading

NVIDIA RTX Spark PCs Coming This Fall With First Systems by ASUS & MSI, Followed By Acer & Gigabyte

NVIDIA Blackwell GB300 Continues To Set World Records for MoE Pre-Training While GB200 Sees A 4x Boost In Perf/W Through Continuous AI Software Stack Optimizations

NVIDIA Vera Rubin NVL72 Enters The Stage With A Monstrous 10x Uplift In Token Throughput Versus Blackwell, Achieves 800,000 Tokens/s Vs GB200's 80,000 at The Same 150MW

NVIDIA Rubin GPUs Bring 10x Increase in Agentic AI Performance Versus Blackwell as Its Architecture Gets Fully Unpacked, Featuring 336 billion Transistors