AI Hardware

NVIDIA Slashes DeepSeek v4 Token Costs By Up To 5x Just One Month After Launch, Through Pure Blackwell Software Tuning

Hassan Mujtaba • Jun 30, 2026 at 03:30pm EDT

NVIDIA Blackwell GPUs continue to see massive optimizations, leading to a 5x drop in token cost in DeepSeek v4 AI models.

NVIDIA Cost Per Token Narrative Sees Massive Gain In DeepSeek V4 As AI Model Sees 5x Boost On Blackwell GPUs With Continued Optimizations

"Cost Per Token" is the fundamental metric for AI TCO, as NVIDIA highlighted this a few months back, and now, the company is delivering the lowest-ever token cost in DeepSeek v4.

Today, NVIDIA announced that its full-stack inference software has brought further optimizations to its hardware stack, such as Blackwell GB200 & GB300, improving their performance & making them better than ever. With the latest optimizations, NVIDIA's Blackwell platform has been able to reduce token costs by up to 5x on DeepSeek V4, just one month after the model's release.

Leading companies and inference providers have already acknowledged these gains on their NVIDIA Blackwell-powered platforms:

Baseten used the NVIDIA TensorRT-LLM open source library to serve DeepSeek V4 Pro on Blackwell GPUs for reasoning, coding and long-context workloads, applying proprietary runtime optimizations to deliver up to 50% more tokens per second.
Cognition uses the NVIDIA Dynamo inference framework to manage inference GPUs, giving its team a ready-made path to scale reinforcement learning workloads without needing to build that infrastructure from scratch.
Deep Infra uses the NVIDIA inference software stack to serve frontier open-source models performantly on Blackwell from day zero, including DeepSeek V4.
Together AI used NVIDIA TensorRT-LLM on Blackwell to help Cursor accelerate the path from model optimizations to production endpoints for its real-time coding experience.

The lower token costs come from turning individual optimizations into system-level performance on NVIDIA GPUs. NVIDIA explains that its inference software stacks achieve these gains by connecting three layers:

Production Operation: Coordinates distributed serving, orchestration, autoscaling and memory management so inference can run across the right compute and storage resources.
Application Acceleration: Runs models with high performance while giving developers room to tune and customize, using runtime optimizations such as overlapping compute and communication and kernel fusion.
Infrastructure Access: Exposes NVIDIA GPU, networking, memory, and system capabilities without requiring developers to manage every device instruction set or data-transfer protocol directly.

These layers are all assembled in the complete systems, which compounds the optimization. On the other hand, NVIDIA's NVLink, NVFP4, Multi-Token-Prediction, and other technologies also offer meaningful gains, offering a combined 20x throughput increase.

NVIDIA’s Blackwell GPUs, powered by continuous full-stack inference optimizations, have achieved a groundbreaking 5x reduction in cost per token for DeepSeek V4 just one month after its release, reinforcing cost per token as the key metric for AI total cost of ownership.

Through seamless integration of production operations, application acceleration, & infrastructure access, along with technologies like NVLink and NVFP4, Blackwell delivers compounded system-level gains, resulting in up to 20x higher throughput. Leading inference providers, including Baseten, Cognition, Deep Infra, and Together AI, are already leveraging these advancements to deliver superior performance for reasoning, coding, and large-scale workloads, further solidifying NVIDIA’s dominance in efficient AI inference.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on NVIDIA Slashes DeepSeek v4 Token Costs By Up To 5x Just One Month After Launch, Through Pure Blackwell Software Tuning

NVIDIA Slashes DeepSeek v4 Token Costs By Up To 5x Just One Month After Launch, Through Pure Blackwell Software Tuning

NVIDIA Cost Per Token Narrative Sees Massive Gain In DeepSeek V4 As AI Model Sees 5x Boost On Blackwell GPUs With Continued Optimizations

Trending Stories

A Modder Fits Entire Grand Theft Auto PS2 Trilogy Inside a Single Game, While Rockstar Continues to Prepare GTA 6

Microsoft Looking To Save As Much As $600 Million By Swapping GPT And Claude For China’s Kimi K3 In Copilot, Risking A Rap On The Knuckles From The Trump Administration

Kirin 9030 In-Depth Analysis Proves SMIC Can Create Denser SoCs Than Intel Has With Its 18A Node, But The Attributes That Require Improvements Are Left Out

Xbox Studio Leaders Reportedly Detest Game Pass, Arguing it Destroyed the Value of Their $40+ Games Now Available for Pennies

AMD Unveils Helios, Its Next-Gen AI Powerhouse With MI455X & 6th Gen EPYC, Challenging NVIDIA’s Rack-Scale Dominance

Popular Discussions

AMD Medusa Point 10-Core “Zen 6” CPU Beats Strix Point 10-Core “Zen 5” By Nearly 35% While Operating at 5.4 GHz

AMD Ryzen 7 7700X3D 4.5 GHz “3D V-Cache” CPU Review: The Budget X3D Champ For AM5

NVIDIA GeForce RTX 50 SUPER GPUs Have Reportedly Arrived At AIBs, But Are On Hold Due To Undecided Memory Prices

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker

AMD Ryzen 7 7800X3D CPU Drops To $299 A Day Ahead of 7700X3D’s Launch, Bringing 3D V-Cache Goodness To Mainstream Gamers

NVIDIA Slashes DeepSeek v4 Token Costs By Up To 5x Just One Month After Launch, Through Pure Blackwell Software Tuning

NVIDIA Cost Per Token Narrative Sees Massive Gain In DeepSeek V4 As AI Model Sees 5x Boost On Blackwell GPUs With Continued Optimizations

Related Story NVIDIA RTX Spark Gets First Developer Drivers as Native Windows on Arm Support Arrives Ahead of Fall Launch

Further Reading

Trending Stories

Popular Discussions