“I Produce The Lowest Cost Tokens In The World” Says NVIDIA CEO As He Highlights The Full-Stack Approach To AI

Apr 21, 2026 at 02:00pm EDT
"I Produce The Lowest Cost Tokens In The World" Says NVIDIA CEO As He Highlights The Full-Stack Approach To AI

NVIDIA CEO has said that while their company produces expensive AI hardware, they also produce the lowest cost tokens in the world.

NVIDIA's Leadership in AI Is Not Only Because of Hardware, But It's full-stack approach that makes the "Lowest Cost Token" Possible

Talking at Cadence Live 2026, NVIDIA CEO stated that they are the leaders of low-cost tokens because they produce the world's lowest-cost tokens. A token is the fundamental unit of AI; think of them like the ABCD of the AI language that AI models process to generate responses.

Related Story MSI Launches 5th-Gen Penta Tandem QD-OLED, 5K Glossy Mini-LED Monitors & “LuckyClaw” AI At Computex 2026

The speed at which tokens are generated depends upon the hardware and the software. It's not a one-way track; hardware can be good and can generate a lot of tokens by simply brute forcing, but that's not an efficient way to do AI. You need a well-guided software stack to back up the generation of tokens. With the software-backed approach, you can generate even more tokens by properly utilizing the hardware.

That's where NVIDIA's CUDA stack comes in. The company has put in years of engineering knowledge to refine its CUDA ecosystem to the point that its hardware is now well-regarded as the best to generate tokens.

"You imagine that, in fact, the future of the world is going to be full-stack. And in a lot of ways, you and I see it exactly the same way. Yeah, it's going to be — you have to understand the software stack on top, the systems that it goes into, the applications of the system beyond that. You have to be a full-stack company because nobody's going to go figure that out for you."

Jensen Huang - NVIDIA CEO

NVIDIA makes it very clear that the path forward is full stack, the software, the hardware, the applications, it all has to come together to deliver AI leadership, and the next frontier is Agentic AI, which has already taken the AI segment by storm.

Jensen also acknowledged that his AI machines are expensive, but they also produce the lowest-cost tokens in the world. How is that possible? Well, simply put, NVIDIA's AI systems, such as Blackwell or upcoming Rubin platforms, are going to cost several millions of dollars, generating several billions of revenue, so you might think that's no way to produce a low-cost token. But the same machines can generate an unprecedented amount of tokens. The cost per token produced by each NVIDIA system is the lowest, and you also factor in the efficiency, each system also offers the lowest token/W (Watt).

“We are the leaders of low token cost, by the way.

I produce the lowest cost tokens in the world. It’s an expensive system, I acknowledge that. It’s the lowest cost tokens in the world. And it’s getting better and better.

The more you buy, the more you save.”

Jensen Huang - NVIDIA CEO

This demonstrates the full-stack approach that Jensen is talking about. In fact, NVIDIA has devised a whole new means of understanding the AI TCO (Total Cost of Operation), and that also revolves around Cost Per Token. The basic principle of the new metric is that one shouldn't go by the max throughput at which AI systems can generate tokens, but rather the cost and power of each system in generating a token are the key values that should be considered.

With the Agentic AI era upon us, NVIDIA faces some big challenges ahead, with everyone stacking up with their own solutions to fight Vera Rubin, and supply constraints getting harder to overcome, but in the many years since NVIDIA started laying out its AI strategy, we have only seen the company achieve success, and that remains the case to this very day.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.