NVIDIA Blackwell Sweeps Every MLPerf 6.0 Benchmark With No Competition In Sight, While GB300 Systems Run Up to 60% Faster Than GB200

•

Jun 17, 2026 at 09:40am EDT

NVIDIA Blackwell Sweeps Every MLPerf 6.0 Benchmark With No Competition In Sight, While GB300 Systems Run Up to 60% Faster Than GB200

The latest MLPerf Training 6.0 benchmarks are in & NVIDIA has once again secured performance records with its Blackwell GPUs.

Blackwell GPUs Make Competition Go Into Hiding at MLPerf 6.0 As NVIDIA Tops Benchmark Charts

The latest MLPerf Training v6.0 benchmark results were shared by MLCommons. The latest round adds two new MoE tests for large-scale and entry-level AI deployments: DeepSeek V3 (671b), and GPT-OSS 20B (21b). Being an open-source and peer-reviewed benchmark suite, MLPerf allows all vendors to list the results of their latest and greatest hardware. NVIDIA has been dominating the suite for a while, and it continues to be the trend.

While NVIDIA is getting ready to launch its AI-Supercharged Vera Rubin platform in the coming months, the current-generation Blackwell architectures, especially GB300 NVL72 systems, are showcasing immense potential with no competition in sight. In the latest results, NVIDIA shows:

Fastest time to train on every benchmark
Largest-scale training across 8,192 GPUs using NVIDIA Blackwell NVL72 systems
The only platform with submissions across all seven benchmarks in the suite

Coming to the benchmark results, NVIDIA was the fastest at each one of them and was also the only one to submit results across all benchmarks in MLPerf 6.0.

Model	NVIDIA Blackwell NVL72	Nearest Alternative
DeepSeek-v3 671B (New)	2.02 mins	No submission
GPT-OSS 20B (New)	7.43 mins	No submission
Llama 3.1 405B	7.07 mins	No submission
Llama 2 70B LoRA	0.40 mins	8.27 mins
Llama 3.1 8B	4.46 mins	58.63 mins
FLUX.1	17.1 mins	74.44 mins
DLRM-dcnv2	0.67 mins	No submission

For reference, NVIDIA's Blackwell platforms were able to achieve stellar speeds. What NVIDIA did in 4.46 mins, the nearest alternative managed to do the same in 58.63 mins, showcasing a 13.1x time split. And for the newest benchmarks, the competition didn't even submit their benchmark results.

Meanwhile, NVIDIA continues to uplift the performance of its existing architectures through further optimizations. Blackwell GB200 is already much faster than it was at launch, but the GB300 systems are up to 60% faster in the same NVL72 configuration thanks to their higher AI compute density with NVFP4.

The Blackwell architecture also scaled to deliver the latest cluster in MLPerf Training, comprising 8192 GPUs running within Microsoft Azure on Llama 3.1 405B. The system reached the quality target in 7.07 minutes, the fastest time-to-train within this benchmark.

Microsoft Azure scaled Llama 3.1 405B training to 8,192 GPUs using GB200 NVL72 systems, and reached the reference quality target in 7.07 minutes, the fastest time to train for this benchmark.
CoreWeave delivered the fastest time to train for DeepSeek-V3 671B, reaching the quality target in 2.02 minutes at 8,192-GPU scale using GB300 NVL72 systems connected with Spectrum-X Ethernet networking.

And lastly, we wanted to share the full results comparing NVIDIA Blackwell GPUs against AMD's latest MI300 series offerings up to the MI355X.

MLPerf Training 6.0 Deepseek v3 671b

Latency (in minutes)

GB300 (8192)

GB300 (4096)

GB200 (8192)

GB200 (4096)

GB300 (2048)

GB200 (2048)

GB300 (512)

GB200 (512)

GB300 (256)

GB200 (256)

In DeepSeek v3 671b, NVIDIA is the single dominating force, with the competition not even submitting a single benchmark result.

MLPerf Training 6.0 Flux1

Latency (in minutes)

100

120

100

120

GB300 (512)

GB300 (72)

GB300 (32)

MI300X (512)

MI320X (64)

In Flux1, 32 NVIDIA GB300 GPUs end up faster than 512 MI300X and 64 MI320X accelerators. No submission for the newer MI350 series was made.

MLPerf Training 6.0 Llama2 70B Lora

Latency (in minutes)

GB300 (512)

GB300 (72)

GB300 (64)

GB300 (32)

GB200 (32)

GB300 (16)

GB200 (16)

GB300 (8)

GB200 (8)

MI355X (8)

MI350X (16)

MI350X (8)

GB300 (4)

MI300X (8)

In Llama 2 70b, NVIDIA's GB300 and GB200 8-accelerator systems outpace the competition.

MLPerf Training 6.0 Llama3.1 8b

Latency (in minutes)

100

150

200

250

300

100

150

200

250

300

GB200 (1024)

GB300 (512)

GB300 (72)

GB300 (64)

GB200 (64)

GB300 (32)

GB300 (16)

GB200 (32)

GB200 (16)

MI350X (16)

GB300 (8)

GB200 (8)

MI355X (8)

MI350X (8)

GB300 (4)

MI325X (8)

Lastly, we have Llama 3.1 8b, where NVIDIA continues to offer more performance at the same number of accelerators, and pushes things beyond that with scale-up configurations.

Whether at massive scale or modest configurations, NVIDIA consistently outperformed the competition, often delivering results that rivals couldn’t even submit. With continued software optimizations and the upcoming Vera Rubin platform on the horizon, NVIDIA’s leadership in AI training remains stronger than ever.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

NVIDIA Blackwell Sweeps Every MLPerf 6.0 Benchmark With No Competition In Sight, While GB300 Systems Run Up to 60% Faster Than GB200

Blackwell GPUs Make Competition Go Into Hiding at MLPerf 6.0 As NVIDIA Tops Benchmark Charts

Related Story Intel Foundry Securing Packaging & Wafer Deal With NVIDIA To Make Next-Gen Feynman GPUs Could Be Its Biggest Customer Win Yet

Further Reading

DeepSeek CEO Believes NVIDIA Is Now "Digging Its Own Grave" Even As 1 NVIDIA GB300 GPU Equals 4 Huawei Acend 950 GPUs

NVIDIA RTX Spark PCs Coming This Fall With First Systems by ASUS & MSI, Followed By Acer & Gigabyte

NVIDIA Blackwell GB300 Continues To Set World Records for MoE Pre-Training While GB200 Sees A 4x Boost In Perf/W Through Continuous AI Software Stack Optimizations

NVIDIA Vera Rubin NVL72 Enters The Stage With A Monstrous 10x Uplift In Token Throughput Versus Blackwell, Achieves 800,000 Tokens/s Vs GB200's 80,000 at The Same 150MW