NVIDIA has become one of the first to submit the 'extensive' MLPerf Inference v6.0 benchmarks, delivering the highest performance relative to "all competitors" combined.
NVIDIA's Blackwell Ultra, Combined With Extreme Co-Design Laws, Manages to Dominate With MLPerf v6.0 Benchmarks
When it comes to benchmark submissions and showcasing the 'prowess' of its computing platforms, NVIDIA has been at the forefront, particularly with MLPerf, where the firm is one of the few entities to complete a rigorous round of benchmarks. This time, according to the company's latest blog post, NVIDIA has discussed its latest submission to MLPerf v6.0, noting that, with Blackwell Ultra and extreme co-design laws, the firm has delivered the "highest AI factory throughput and lowest token cost". Team Green's MLPerf inference training wins are nine times higher than the nearest competitor, indicating the infrastructure lead the company has.
With the new Inference v6.0, the MLCommons team has added support for newer reasoning and MoE models, including DeepSeek-R1, GPT-OSS-120B, and Mixtral 8x7B. At the same time, the version also focuses on dense LLMs, generative recommenders, and vision-language models, indicating that the benchmark targets a wider range of workloads common to today's enterprise requirements. This is one of the reasons why Jensen has called MLPerf one of the most "intense" benchmarking suites, which, interestingly, NVIDIA dominates. Here are the results achieved by NVIDIA, and there's an interesting aspect to them.
| enchmark | GB300 NVL72 v5.1 | GB300 NVL72v6.0 | Speedup |
| DeepSeek-R1(Server) | 2,907 tokens/sec/gpu | 8,064 tokens/sec/gpu | 2.77x |
| DeepSeek-R1(Offline) | 5,842 tokens/sec/gpu | 9,821 tokens/sec/gpu | 1.68x |
| Llama 3.1 405B(Server) | 170 tokens/sec/gpu | 259 tokens/sec/gpu | 1.52x |
| Llama 3.1 405B(Offline) | 224 tokens/sec/gpu | 271 tokens/sec/gpu | 1.21x |
The results not only indicate an extensive lead in token/sec/GPU figures, but also show that NVIDIA's advantage is also driven by a roundup of software optimizations, which is why, since the first submission back in a few months ago on the DeepSeek-R1 benchmark, NVIDIA has seen a 2.7x higher token throughput, without any hardware changes. And, on a hardware level, when compared against GB200 NVL72, NVIDIA scores up to a 2.77x speedup with v6.0, which means that the generation upgrades are consistent and visible across benchmarks as aggressive as MLPerf v6.0
NVIDIA claims that they were the only ones to submit DeepSeek-R1 results to MLPerf Inference last year, and with the newer version, which adds further scrutiny to the hardware, the lead with Blackwell Ultra is maintained.
Delivered inference throughput takes extreme co-design across many chips, system architecture, data center design, and software. The latest MLPerf Inference v6.0 results show that NVIDIA yields unmatched inference throughput across the broadest range of workloads, from massive LLMs to advanced vision language models, to generative recommender systems and more, on industry-standard benchmarks.
NVIDIA's approach of being transparent about what it has done with its hardware is one of the reasons the firm is admired within the developer community as well. MLPerf, in general, is an aggressive testing suite, which is why several ASIC manufacturers, and even AMD, haven't taken part in the benchmarking process as extensively as NVIDIA has. At the same time, Inference v6.0 testing also supports NVIDIA's narrative of providing the best possible hardware to its customers, as evidenced by token/$ figures and TCOs for large-scale deployment.
Follow Wccftech on Google to get more of our news coverage in your feeds.
