NVIDIA Blackwell Ultra Secures Win Across All Seven MLPerf AI Training Benchmarks, GB200 NVL72 Sets Record 10 Minutes Training Time For Llama 405B

•

Nov 12, 2025 at 03:20pm EST

By securing wins across all MLPerf training tests, NVIDIA boasts its Blackwell Ultra-based GB300 NVL72 platform, which delivers leading AI training performance.

NVIDIA Showcases its GB300 NVL72 "Blackwell Ultra" Results in MLPerf AI Training Tests; Up To Five Times the Performance vs Hopper-Based Platform

When it comes to delivering leading AI performance, NVIDIA GPUs have always been at the forefront. The Blackwell-based data center GPUs have already showcased their incredible potential several times previously, and the latest GB300 NVL72 platform is no exception.

Today, NVIDIA has proudly announced that its Blackwell Ultra-powered AI GPUs have secured the first position in every MLPerf training benchmark, proving that its GB300 NVL72 rack-scale system is still the best possible choice for intensive AI workloads.

In the blog post, NVIDIA claims that it's the only player to have submitted the results on every MLPerf test and has expanded the performance gap between itself and its rivals. The graph it shared shows that NVIDIA's GB200 and GB300 platform has scored numerous of MLPerf Training and Inference wins this year. The most recent ones are these:

Llama 3.1 405B: 10 min
Llama 2 70B LoRA: 0.4 min
Llama 3.1 8B: 5.2 min
FLUX.1: 12.5 min
DLRM-dcnv2: 0.71 min
R-GAT: 1.1 min
RetinaNet: 1.4 min

The benchmark results show that NVIDIA achieved significantly superior results with the same number of Blackwell Ultra GPUs in the rack system as the Hopper-based GPUs. In Llama 3.1 40B pretraining, the GB300 GPUs deliver over 4X the performance vs H100 and nearly 2X vs the Blackwell GB200. Similarly, in the Llama 2 70B Fine-Tuning, 8 GB300 GPUs delivered 5X the performance vs H100.

NVIDIA also boasted about its CUDA ecosystem, which has a big leverage over its competitors. The CUDA software stack excels at it, but the rack system itself, plus the Quantum-X800 InfiniBand at 800 GB/s networking, is also unmatched. The GB300 NVL72 brings 279 GB HBM3e memory per GPU, and an incredible 40 TB total capacity with GPU and CPU memory combined. Such a monster memory configuration speeds up AI workloads, but using the FP4 precision for training is also the key to excellent performance.

NVIDIA says that it has ensured the adoption of FP4 precision for LLM training at every layer to double the speed of calculations compared to FP8. The Blackwell Ultra further boosts that to 3X, which is why NVIDIA was able to crush the competitors and deliver drastically superior performance without increasing the GPU count. Compared to its June submission, the new results were achieved using 5,120 Blackwell GB200 GPUs, which took only 10 minutes to train the Llama 3.1 405B parameter.

Update: The Llama 3.1 405B benchmark was conducted using GB200 NVL72 and not GB300 NVL72.

News Source: NVIDIA

About the author: Sarfraz Khan is a hardware reporter with a focus on PC components and the builder community. With years of experience writing about PC hardware and laptops, his work has been featured on several reputable technology publications. Sarfraz's hands-on experience is demonstrated through his first-person accounts of using and comparing different hardware configurations, providing practical and relatable insights for everyday users. His technical analysis is respected by peers in the enthusiast community and has been cited by specialized hardware sites such as Germany's Igor's Lab.

Follow Wccftech on Google to get more of our news coverage in your feeds.

NVIDIA Blackwell Ultra Secures Win Across All Seven MLPerf AI Training Benchmarks, GB200 NVL72 Sets Record 10 Minutes Training Time For Llama 405B

NVIDIA Showcases its GB300 NVL72 "Blackwell Ultra" Results in MLPerf AI Training Tests; Up To Five Times the Performance vs Hopper-Based Platform

Related Story China’s Domestic Chip Shipments To Surge To 5 Million Unites This Year, Says Report

Further Reading

NVIDIA Brings Local AI Agents To Its Most Powerful Workstation PC, The DGX Station, With The NVIDIA Agent Toolkit & Omniverse

Data Center Electricians Are Now Earning $280,000 Per Year At A Time When Computer Engineering Graduates Face Chronic Unemployment

Sentiment Around Apple’s Position In AI Has Changed, Says Analyst, With Company “Less Exposed To Capex Intensity,” Than Its Rivals, Giving It A Monetization Edge

Claude User Claims Using The AI Chatbot Made His “Internet 100x Faster,” But Just A Simple 5-Minute Diagnostics Could Have Resolved The Issue Too