NVIDIA Blackwell Sweeps Every MLPerf 6.0 Benchmark With No Competition In Sight, While GB300 Systems Run Up to 60% Faster Than GB200

Hassan Mujtaba • Jun 17, 2026 at 09:40am EDT

The latest MLPerf Training 6.0 benchmarks are in & NVIDIA has once again secured performance records with its Blackwell GPUs.

Blackwell GPUs Make Competition Go Into Hiding at MLPerf 6.0 As NVIDIA Tops Benchmark Charts

The latest MLPerf Training v6.0 benchmark results were shared by MLCommons. The latest round adds two new MoE tests for large-scale and entry-level AI deployments: DeepSeek V3 (671b), and GPT-OSS 20B (21b). Being an open-source and peer-reviewed benchmark suite, MLPerf allows all vendors to list the results of their latest and greatest hardware. NVIDIA has been dominating the suite for a while, and it continues to be the trend.

While NVIDIA is getting ready to launch its AI-Supercharged Vera Rubin platform in the coming months, the current-generation Blackwell architectures, especially GB300 NVL72 systems, are showcasing immense potential with no competition in sight. In the latest results, NVIDIA shows:

Fastest time to train on every benchmark
Largest-scale training across 8,192 GPUs using NVIDIA Blackwell NVL72 systems
The only platform with submissions across all seven benchmarks in the suite

Coming to the benchmark results, NVIDIA was the fastest at each one of them and was also the only one to submit results across all benchmarks in MLPerf 6.0.

Model	NVIDIA Blackwell NVL72	Nearest Alternative
DeepSeek-v3 671B (New)	2.02 mins	No submission
GPT-OSS 20B (New)	7.43 mins	No submission
Llama 3.1 405B	7.07 mins	No submission
Llama 2 70B LoRA	0.40 mins	8.27 mins
Llama 3.1 8B	4.46 mins	58.63 mins
FLUX.1	17.1 mins	74.44 mins
DLRM-dcnv2	0.67 mins	No submission

For reference, NVIDIA's Blackwell platforms were able to achieve stellar speeds. What NVIDIA did in 4.46 mins, the nearest alternative managed to do the same in 58.63 mins, showcasing a 13.1x time split. And for the newest benchmarks, the competition didn't even submit their benchmark results.

Meanwhile, NVIDIA continues to uplift the performance of its existing architectures through further optimizations. Blackwell GB200 is already much faster than it was at launch, but the GB300 systems are up to 60% faster in the same NVL72 configuration thanks to their higher AI compute density with NVFP4.

The Blackwell architecture also scaled to deliver the latest cluster in MLPerf Training, comprising 8192 GPUs running within Microsoft Azure on Llama 3.1 405B. The system reached the quality target in 7.07 minutes, the fastest time-to-train within this benchmark.

Microsoft Azure scaled Llama 3.1 405B training to 8,192 GPUs using GB200 NVL72 systems, and reached the reference quality target in 7.07 minutes, the fastest time to train for this benchmark.
CoreWeave delivered the fastest time to train for DeepSeek-V3 671B, reaching the quality target in 2.02 minutes at 8,192-GPU scale using GB300 NVL72 systems connected with Spectrum-X Ethernet networking.

And lastly, we wanted to share the full results comparing NVIDIA Blackwell GPUs against AMD's latest MI300 series offerings up to the MI355X.

MLPerf Training 6.0 Deepseek v3 671b

Latency (in minutes)

GB300 (8192)

GB300 (4096)

GB200 (8192)

GB200 (4096)

GB300 (2048)

GB200 (2048)

GB300 (512)

GB200 (512)

GB300 (256)

GB200 (256)

In DeepSeek v3 671b, NVIDIA is the single dominating force, with the competition not even submitting a single benchmark result.

MLPerf Training 6.0 Flux1

Latency (in minutes)

100

120

100

120

GB300 (512)

GB300 (72)

GB300 (32)

MI300X (512)

MI320X (64)

In Flux1, 32 NVIDIA GB300 GPUs end up faster than 512 MI300X and 64 MI320X accelerators. No submission for the newer MI350 series was made.

MLPerf Training 6.0 Llama2 70B Lora

Latency (in minutes)

GB300 (512)

GB300 (72)

GB300 (64)

GB300 (32)

GB200 (32)

GB300 (16)

GB200 (16)

GB300 (8)

GB200 (8)

MI355X (8)

MI350X (16)

MI350X (8)

GB300 (4)

MI300X (8)

In Llama 2 70b, NVIDIA's GB300 and GB200 8-accelerator systems outpace the competition.

MLPerf Training 6.0 Llama3.1 8b

Latency (in minutes)

100

150

200

250

300

100

150

200

250

300

GB200 (1024)

GB300 (512)

GB300 (72)

GB300 (64)

GB200 (64)

GB300 (32)

GB300 (16)

GB200 (32)

GB200 (16)

MI350X (16)

GB300 (8)

GB200 (8)

MI355X (8)

MI350X (8)

GB300 (4)

MI325X (8)

Lastly, we have Llama 3.1 8b, where NVIDIA continues to offer more performance at the same number of accelerators, and pushes things beyond that with scale-up configurations.

Whether at massive scale or modest configurations, NVIDIA consistently outperformed the competition, often delivering results that rivals couldn’t even submit. With continued software optimizations and the upcoming Vera Rubin platform on the horizon, NVIDIA’s leadership in AI training remains stronger than ever.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on NVIDIA Blackwell Sweeps Every MLPerf 6.0 Benchmark With No Competition In Sight, While GB300 Systems Run Up to 60% Faster Than GB200

NVIDIA Blackwell Sweeps Every MLPerf 6.0 Benchmark With No Competition In Sight, While GB300 Systems Run Up to 60% Faster Than GB200

Blackwell GPUs Make Competition Go Into Hiding at MLPerf 6.0 As NVIDIA Tops Benchmark Charts

MLPerf Training 6.0 Deepseek v3 671b

MLPerf Training 6.0 Flux1

MLPerf Training 6.0 Llama2 70B Lora

MLPerf Training 6.0 Llama3.1 8b

Trending Stories

Apple Is Facing A Shift It Never Encountered In Its Two-Decade History; Meeting Customers Wanting The Same LPDDR Memory, But At Higher Margins

Yoshi-P Warns Final Fantasy VI, VIII and IX Remakes Would Need 5 Installments, As Square Enix Teases Final Fantasy XVII

Square Enix Attempts The Impossible With Final Fantasy VII Revelation, As It Improves Rebirth’s “Perfect” Battle System For Part 3

PlayStation 5 GTA 6 Download Codes Turn Region Locked, Leaving Importers Stuck While Xbox Buyers Escape the Restriction

Intel 18A-P and 14A gain Cadence’s full EDA toolchain, tightening the ecosystem push to challenge TSMC’s foundry lead

Popular Discussions

Watch The AMD “Advancing AI 2026” Event Live Here – Next-Gen Zen 6 EPYC CPUs, Instinct MI400 Series & Helios AI Rack Launch

AMD Ryzen With Zen 7 Cores Could Be The Last “Zen” Family For AM5, As Zen 8 Likely Moving To AM6 With DDR6 & PCIe 6.0 Support

AMD Unveils Helios, Its Next-Gen AI Powerhouse With MI455X & 6th Gen EPYC, Challenging NVIDIA’s Rack-Scale Dominance

AMD Zen 7 “2028” and Zen 8 “2030” CPU Architectures Confirmed – EPYC Florence “Zen 7” To Feature Next-Gen Node, & ACE Extensions

AMD EPYC “Venice” Gives Us A Preview of Zen 6-Based Ryzen “Olympic Ridge” CPUs: More Cores, More (3D V-)Cache, Clocks & Scalable Configs

NVIDIA Blackwell Sweeps Every MLPerf 6.0 Benchmark With No Competition In Sight, While GB300 Systems Run Up to 60% Faster Than GB200

Blackwell GPUs Make Competition Go Into Hiding at MLPerf 6.0 As NVIDIA Tops Benchmark Charts

Related Story Intel Foundry Securing Packaging & Wafer Deal With NVIDIA To Make Next-Gen Feynman GPUs Could Be Its Biggest Customer Win Yet

Further Reading

Trending Stories

Popular Discussions