NVIDIA's Blackwell platform has brought new levels of token optimization to AI inference workloads, as the company reveals a massive milestone in the realm of tokenomics.
NVIDIA's GB200 NVL72 Achieves 10x Better Tokenomics Than Hopper, Credited "Expert-Level" Parallelism
While NVIDIA has been racing to build new infrastructure in the AI world, one of the company's biggest focuses has been improving the efficiency of the hardware it deploys. And, with the Blackwell-trained frontier AI models dropping in the industry, we have seen how NVIDIA has progressed with token output and costs, and now, in a new blog post, the company has revealed that they have been working with businesses to scale up Blackwell performance, showing a significant ten-fold improvement over the Hopper generation.
That’s why leading inference providers including Baseten, DeepInfra, Fireworks AI and Together AI are using the NVIDIA Blackwell platform, which helps them reduce cost per token by up to 10x compared with the NVIDIA Hopper platform. These providers host advanced open source models, which have now reached frontier-level intelligence.
By combining open source frontier intelligence, the extreme hardware-software codesign of NVIDIA Blackwell and their own optimized inference stacks, these providers are enabling dramatic token cost reductions for businesses across every industry.
- NVIDIA
While discussing tokenomics on Blackwell, NVIDIA has labeled organizations like Baseten and Sully.ai, along with the gaming-focused DeepInfra and Latitude. For each company, the Blackwell architecture has enabled them to achieve lower latency, optimal inference costs, and reliable responses, which is why the tech stack is the go-to option for mainstream AI companies today. Even in multi-agent workflows and deploying specialized AI agents, a company called Sentient Labs has achieved "25-50% better cost efficiency" relative to Hopper.
NVIDIA's progress with the Blackwell AI architecture is driven by its "extreme co-design" approach, which is well-suited to today's MoE architectures. With GB200 NVL72, NVIDIA uses a 72-chip configuration coupled with 30TB of fast shared memory to take expert parallelism to a whole new level, ensuring that token batches are constantly split and scattered across GPUs, and that communication volume increases at a non-linear rate. This is one of the reasons why tokenomics will be Blackwell's most efficient figures yet.
With Vera Rubin, Team Green plans to take infrastructure efficiency to a whole new level, driven by architecture advancements, specialized mechanisms like CPX for prefill, and much more. The world of AI is evolving at an overwhelming pace, which is why we need to recognize that optimizing hardware is as important as developing new ones.
Follow Wccftech on Google to get more of our news coverage in your feeds.
