NVIDIA Has Managed to Reduce Token Costs by a Whopping 10x With Its Newest Blackwell Platform, Credited to Team Green’s “Extreme Codesign” Approach

Feb 12, 2026 at 11:38am EST
A data center with multiple rows of black server racks, illuminated by streams of light representing data flow.

NVIDIA's Blackwell platform has brought new levels of token optimization to AI inference workloads, as the company reveals a massive milestone in the realm of tokenomics.

NVIDIA's GB200 NVL72 Achieves 10x Better Tokenomics Than Hopper, Credited "Expert-Level" Parallelism

While NVIDIA has been racing to build new infrastructure in the AI world, one of the company's biggest focuses has been improving the efficiency of the hardware it deploys. And, with the Blackwell-trained frontier AI models dropping in the industry, we have seen how NVIDIA has progressed with token output and costs, and now, in a new blog post, the company has revealed that they have been working with businesses to scale up Blackwell performance, showing a significant ten-fold improvement over the Hopper generation.

Related Story 200MW of US-UAE’s Jointly Planned 5GW AI Campus Is Coming Online Soon, Powered by 1000s of Next-Gen Chips

That’s why leading inference providers including Baseten, DeepInfra, Fireworks AI and Together AI are using the NVIDIA Blackwell platform, which helps them reduce cost per token by up to 10x compared with the NVIDIA Hopper platform. These providers host advanced open source models, which have now reached frontier-level intelligence.

By combining open source frontier intelligence, the extreme hardware-software codesign of NVIDIA Blackwell and their own optimized inference stacks, these providers are enabling dramatic token cost reductions for businesses across every industry.

- NVIDIA

While discussing tokenomics on Blackwell, NVIDIA has labeled organizations like Baseten and Sully.ai, along with the gaming-focused DeepInfra and Latitude. For each company, the Blackwell architecture has enabled them to achieve lower latency, optimal inference costs, and reliable responses, which is why the tech stack is the go-to option for mainstream AI companies today. Even in multi-agent workflows and deploying specialized AI agents, a company called Sentient Labs has achieved "25-50% better cost efficiency" relative to Hopper.

NVIDIA's progress with the Blackwell AI architecture is driven by its "extreme co-design" approach, which is well-suited to today's MoE architectures. With GB200 NVL72, NVIDIA uses a 72-chip configuration coupled with 30TB of fast shared memory to take expert parallelism to a whole new level, ensuring that token batches are constantly split and scattered across GPUs, and that communication volume increases at a non-linear rate. This is one of the reasons why tokenomics will be Blackwell's most efficient figures yet.

With Vera Rubin, Team Green plans to take infrastructure efficiency to a whole new level, driven by architecture advancements, specialized mechanisms like CPX for prefill, and much more. The world of AI is evolving at an overwhelming pace, which is why we need to recognize that optimizing hardware is as important as developing new ones.

About the author: Muhammad Zuhair is a hardware and technology reporter for Wccftech, specializing in the semiconductor industry and the complex interplay between technology, manufacturing, and geopolitics. His coverage focuses on the corporate strategies and technological roadmaps of industry giants like TSMC, NVIDIA, Samsung, and Intel. Zuhair's expertise lies in deconstructing complex topics such as fabrication nodes (e.g., 2nm process), the economic impact of policies like the CHIPS Act, and the strategic development of AI infrastructure from NVIDIA, AMD and Intel.

Follow Wccftech on Google to get more of our news coverage in your feeds.