NVIDIA Is Among the First to Submit MLPerf Inference v6.0 Benchmarks With Blackwell Ultra, and It’s Total Domination Over Competitors

Apr 1, 2026 at 02:46pm EDT
Man showcasing NVIDIA GPU on stage with server racks in the background.

NVIDIA has become one of the first to submit the 'extensive' MLPerf Inference v6.0 benchmarks, delivering the highest performance relative to "all competitors" combined.

NVIDIA's Blackwell Ultra, Combined With Extreme Co-Design Laws, Manages to Dominate With MLPerf v6.0 Benchmarks

When it comes to benchmark submissions and showcasing the 'prowess' of its computing platforms, NVIDIA has been at the forefront, particularly with MLPerf, where the firm is one of the few entities to complete a rigorous round of benchmarks. This time, according to the company's latest blog post, NVIDIA has discussed its latest submission to MLPerf v6.0, noting that, with Blackwell Ultra and extreme co-design laws, the firm has delivered the "highest AI factory throughput and lowest token cost". Team Green's MLPerf inference training wins are nine times higher than the nearest competitor, indicating the infrastructure lead the company has.

Related Story Agentic AI Pushes CPUs to Pack 400 GB of Memory, 4x More Than Today, as DRAM Shortage Spirals Toward 2027

With the new Inference v6.0, the MLCommons team has added support for newer reasoning and MoE models, including DeepSeek-R1, GPT-OSS-120B, and Mixtral 8x7B. At the same time, the version also focuses on dense LLMs, generative recommenders, and vision-language models, indicating that the benchmark targets a wider range of workloads common to today's enterprise requirements. This is one of the reasons why Jensen has called MLPerf one of the most "intense" benchmarking suites, which, interestingly, NVIDIA dominates. Here are the results achieved by NVIDIA, and there's an interesting aspect to them.

enchmarkGB300 NVL72 v5.1GB300 NVL72v6.0Speedup
DeepSeek-R1(Server)2,907 tokens/sec/gpu8,064 tokens/sec/gpu2.77x
DeepSeek-R1(Offline)5,842 tokens/sec/gpu9,821 tokens/sec/gpu1.68x
Llama 3.1 405B(Server)170 tokens/sec/gpu259 tokens/sec/gpu1.52x
Llama 3.1 405B(Offline)224 tokens/sec/gpu271 tokens/sec/gpu1.21x

The results not only indicate an extensive lead in token/sec/GPU figures, but also show that NVIDIA's advantage is also driven by a roundup of software optimizations, which is why, since the first submission back in a few months ago on the DeepSeek-R1 benchmark, NVIDIA has seen a 2.7x higher token throughput, without any hardware changes. And, on a hardware level, when compared against GB200 NVL72, NVIDIA scores up to a 2.77x speedup with v6.0, which means that the generation upgrades are consistent and visible across benchmarks as aggressive as MLPerf v6.0

NVIDIA claims that they were the only ones to submit DeepSeek-R1 results to MLPerf Inference last year, and with the newer version, which adds further scrutiny to the hardware, the lead with Blackwell Ultra is maintained.

Delivered inference throughput takes extreme co-design across many chips, system architecture, data center design, and software. The latest MLPerf Inference v6.0 results show that NVIDIA yields unmatched inference throughput across the broadest range of workloads, from massive LLMs to advanced vision language models, to generative recommender systems and more, on industry-standard benchmarks. 

NVIDIA's approach of being transparent about what it has done with its hardware is one of the reasons the firm is admired within the developer community as well. MLPerf, in general, is an aggressive testing suite, which is why several ASIC manufacturers, and even AMD, haven't taken part in the benchmarking process as extensively as NVIDIA has. At the same time, Inference v6.0 testing also supports NVIDIA's narrative of providing the best possible hardware to its customers, as evidenced by token/$ figures and TCOs for large-scale deployment.

About the author: Muhammad Zuhair is a hardware and technology reporter for Wccftech, specializing in the semiconductor industry and the complex interplay between technology, manufacturing, and geopolitics. His coverage focuses on the corporate strategies and technological roadmaps of industry giants like TSMC, NVIDIA, Samsung, and Intel. Zuhair's expertise lies in deconstructing complex topics such as fabrication nodes (e.g., 2nm process), the economic impact of policies like the CHIPS Act, and the strategic development of AI infrastructure from NVIDIA, AMD and Intel.

Follow Wccftech on Google to get more of our news coverage in your feeds.