NVIDIA Blackwell Sweeps Every MLPerf 6.0 Benchmark With No Competition In Sight, While GB300 Systems Run Up to 60% Faster Than GB200

Jun 17, 2026 at 09:40am EDT
NVIDIA Blackwell Sweeps Every MLPerf 6.0 Benchmark With No Competition In Sight, While GB300 Systems Run Up to 60% Faster Than GB200

The latest MLPerf Training 6.0 benchmarks are in & NVIDIA has once again secured performance records with its Blackwell GPUs.

Blackwell GPUs Make Competition Go Into Hiding at MLPerf 6.0 As NVIDIA Tops Benchmark Charts

The latest MLPerf Training v6.0 benchmark results were shared by MLCommons. The latest round adds two new MoE tests for large-scale and entry-level AI deployments: DeepSeek V3 (671b), and GPT-OSS 20B (21b). Being an open-source and peer-reviewed benchmark suite, MLPerf allows all vendors to list the results of their latest and greatest hardware. NVIDIA has been dominating the suite for a while, and it continues to be the trend.

Related Story NVIDIA GB300 Dominates Agentic AI Workloads With 20x Performance Leap Over Hopper As Rubin Nears Launch

While NVIDIA is getting ready to launch its AI-Supercharged Vera Rubin platform in the coming months, the current-generation Blackwell architectures, especially GB300 NVL72 systems, are showcasing immense potential with no competition in sight. In the latest results, NVIDIA shows:

Coming to the benchmark results, NVIDIA was the fastest at each one of them and was also the only one to submit results across all benchmarks in MLPerf 6.0.

ModelNVIDIA Blackwell NVL72Nearest Alternative
DeepSeek-v3 671B (New)2.02 minsNo submission
GPT-OSS 20B (New)7.43 minsNo submission
Llama 3.1 405B7.07 minsNo submission
Llama 2 70B LoRA0.40 mins8.27 mins
Llama 3.1 8B4.46 mins58.63 mins
FLUX.117.1 mins74.44 mins
DLRM-dcnv20.67 minsNo submission

For reference, NVIDIA's Blackwell platforms were able to achieve stellar speeds. What NVIDIA did in 4.46 mins, the nearest alternative managed to do the same in 58.63 mins, showcasing a 13.1x time split. And for the newest benchmarks, the competition didn't even submit their benchmark results.

Meanwhile, NVIDIA continues to uplift the performance of its existing architectures through further optimizations. Blackwell GB200 is already much faster than it was at launch, but the GB300 systems are up to 60% faster in the same NVL72 configuration thanks to their higher AI compute density with NVFP4.

The Blackwell architecture also scaled to deliver the latest cluster in MLPerf Training, comprising 8192 GPUs running within Microsoft Azure on Llama 3.1 405B. The system reached the quality target in 7.07 minutes, the fastest time-to-train within this benchmark.

And lastly, we wanted to share the full results comparing NVIDIA Blackwell GPUs against AMD's latest MI300 series offerings up to the MI355X.

MLPerf Training 6.0 Deepseek v3 671b
Latency (in minutes)
0
9
18
27
36
45
54
0
9
18
27
36
45
54
GB300 (8192)
2.021
GB300 (4096)
3.092
GB200 (8192)
3.340
GB200 (4096)
4.384
GB300 (2048)
5.535
GB200 (2048)
7.844
GB300 (512)
17.517
GB200 (512)
27.612
GB300 (256)
33.430
GB200 (256)
49.438

In DeepSeek v3 671b, NVIDIA is the single dominating force, with the competition not even submitting a single benchmark result.

MLPerf Training 6.0 Flux1
Latency (in minutes)
0
20
40
60
80
100
120
0
20
40
60
80
100
120
GB300 (512)
17.11
GB300 (72)
36.53
GB300 (32)
65.97
MI300X (512)
74.43
MI320X (64)
92.36

In Flux1, 32 NVIDIA GB300 GPUs end up faster than 512 MI300X and 64 MI320X accelerators. No submission for the newer MI350 series was made.

MLPerf Training 6.0 Llama2 70B Lora
Latency (in minutes)
0
5
10
15
20
25
30
0
5
10
15
20
25
30
GB300 (512)
0.400
GB300 (72)
1.166
GB300 (64)
1.263
GB300 (32)
2.470
GB200 (32)
2.851
GB300 (16)
4.508
GB200 (16)
5.345
GB300 (8)
5.613
GB200 (8)
7.856
MI355X (8)
8.271
MI350X (16)
8.522
MI350X (8)
10.093
GB300 (4)
19.301
MI300X (8)
28.648

In Llama 2 70b, NVIDIA's GB300 and GB200 8-accelerator systems outpace the competition.

MLPerf Training 6.0 Llama3.1 8b
Latency (in minutes)
0
50
100
150
200
250
300
0
50
100
150
200
250
300
GB200 (1024)
4.459
GB300 (512)
4.636
GB300 (72)
11.586
GB300 (64)
12.447
GB200 (64)
16.536
GB300 (32)
20.200
GB300 (16)
33.391
GB200 (32)
39.014
GB200 (16)
49.047
MI350X (16)
58.629
GB300 (8)
63.516
GB200 (8)
82.213
MI355X (8)
86.845
MI350X (8)
108.965
GB300 (4)
123.732
MI325X (8)
238.073

Lastly, we have Llama 3.1 8b, where NVIDIA continues to offer more performance at the same number of accelerators, and pushes things beyond that with scale-up configurations.

Whether at massive scale or modest configurations, NVIDIA consistently outperformed the competition, often delivering results that rivals couldn’t even submit. With continued software optimizations and the upcoming Vera Rubin platform on the horizon, NVIDIA’s leadership in AI training remains stronger than ever.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.