MLPerf v5.1 AI Inference Benchmark Showdown: NVIDIA Blackwell Ultra GB300 & AMD Instinct MI355X In The Spotlight

•

Sep 9, 2025 at 04:00pm EDT

MLPerf v5.1 AI Inference Benchmark Showdown: NVIDIA Blackwell Ultra GB300 & AMD Instinct MI355X In The Spotlight 1

NVIDIA's Blackwell Ultra GB300 & AMD's Instinct MI355X have finally appeared in the latest MLPerf v3.1 AI inference benchmarks.

NVIDIA Blackwell Ultra GB300 & AMD Instinct MI355X Rock MLPerf v5.1 Benchmarks With Unmatched AI Performance

Today, MLCommons published its latest round of MLPerf v5.1 AI Inference benchmarks, and there were a few submissions that stole the spotlight. These are NVIDIA's Blackwell Ultra GB300, AMD's Instinct MI355X, and the Intel Arc Pro B60 (we've covered more about this here). The GB300 and MI355X are the fastest AI options from each vendor; as such, there is a lot of attention focused on these products, so let's see what these chips have on offer in MLPerf.

Starting with DeepSeek R1 (Offline), NVIDIA's GB300 simply crushes it, offering a 45% gain over GB200 in a 72 GPU comparison, while the 8 GPU comparison shows a 44% uplift. This is about as much as NVIDIA had promised (50% gain with Blackwell Ultra).

MLPerf v5.1 (Deepseek R1 Offline)

Samples/s

70095

140190

210285

280380

350475

420570

70095

140190

210285

280380

350475

420570

GB300 (x72)

GB200 (x72)

GB300 (x8)

GB200 (x8)

Next up, we have the DeepSeek R1 (Server) comparison, which shows a 25% gain in the 72-GPU and a 21% gain in the 8-GPU submissions.

MLPerf v5.1 (Deepseek R1 Server)

Queries/s

34888

69776

104664

139552

174440

209328

34888

69776

104664

139552

174440

209328

GB300 (x72)

GB200 (x72)

GB300 (x8)

GB200 (x8)

Moving over to Llama 3.1 405B (Offline), here, we get to see our first AMD Instinct MI355X comparison, but against the GB200, since no submission was made for an 8-GPU configuration in this benchmark. It looks like the new AMD Instinct platform brings a solid 27% increase here.

MLPerf v5.1 (llama 3.1 405b Offline)

Tokens/s

4000

8000

12000

16000

20000

24000

4000

8000

12000

16000

20000

24000

GB300 (x72)

GB200 (x72)

MI355X (8x)

GB200 (8x)

Then we have the Llama 2 70B (Offline) comparison, which shows the Instinct MI355X offering up to 648248 token generation per second with a 64-chip configuration, 350820 token generation per second with a 32-chip configuration, and 65770 token generation per second with an 8-chip configuration. This is a massive 2.09x increase over the NVIDIA GB200 (x8) configuration. The Arc Pro B60 scores 3009 tokens/s here, but its value proposition is much better than pure datacenter/HPC chips such as Blackwell Ultra and AMD Instinct series.

MLPerf v5.1 (llama2-70B-99.9 Offline "Open Division")

Tokens/s

108042

216084

324126

432168

540210

648252

108042

216084

324126

432168

540210

648252

MI355X (64x)

MI355X (32x)

MI355X (x8)

B200 (x8)

H200 (x8)

MI300X (x16)

Maxsun Arc Pro B60 (x4)

With that said, NVIDIA has shared the full spectrum of benchmarks and the various records that they achieved using their new Blackwell Ultra GB300 platform.

Following is their full record table:

MLPerf Inference Per-Accelerator Records
Benchmark	Offline	Server	Interactive
DeepSeek-R1	5,842 tokens/second/GPU	2,907 tokens/second/GPU	**
Llama 3.1 405B	224 tokens/second/GPU	170 tokens/second/GPU	138 tokens/second/GPU
Llama 2 70B 99.9%	12,934 tokens/second/GPU	12,701 tokens/second/GPU	7,856 tokens/second/GPU
Llama 2 70B 99%	13,015 tokens/second/GPU	12,701 tokens/second/GPU	7,856 tokens/second/GPU
Llama 3.1 8B	18,370 tokens/second/GPU	16,099 tokens/second/GPU	15,284 tokens/second/GPU
Stable Diffusion XL	4.07 samples/second/GPU	3.59 queries/second/GPU	**
Mixtral 8x7B	16,099 tokens/second/GPU	16,131 tokens/second/GPU	**
DLRMv2 99%	87,228 samples/second/GPU	80,515 samples/second/GPU	**
DLRMv2 99.9%	48,666 samples/second/GPU	46,259 queries/second/GPU	**
Whisper	5,667 tokens/second/GPU	**	**
R-GAT	81,404 samples/second/GPU	**	**
Retinanet	1,875 samples/second/GPU	1,801 queries/second/GPU	**

Not only that, but NVIDIA's Blackwell Ultra Lao sets reasoning records in its debut at MLPerf. The Blackwell Ultra GPUs score a 4.7x lead in offline and a 5.2x lead in server comparisons against the Hopper platform.

DeepSeek-R1 Performance
Architecture	Offline	Server
Hopper	1,253 tokens/second/GPU	556 tokens/second/GPU
Blackwell Ultra	5,842 tokens/second/GPU	2,907 tokens/second/GPU
Blackwell Ultra Advantage	4.7x	5.2x

In the next round of MLPerf submissions, we can expect NVIDIA, AMD, and Intel further optimize their existing platforms for better performance, resulting in higher scores within these benchmarks.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

MLPerf v5.1 AI Inference Benchmark Showdown: NVIDIA Blackwell Ultra GB300 & AMD Instinct MI355X In The Spotlight

NVIDIA Blackwell Ultra GB300 & AMD Instinct MI355X Rock MLPerf v5.1 Benchmarks With Unmatched AI Performance

Related Story ASUS Rolls Out New BIOS Update For 600 And 800 Series AMD Motherboards, Enhancing Compatibility With CXMT Memory

Further Reading

AMD Prepares For Ryzen AI MAX PRO 400 Launch With ROCm 7.14 Support

AMD Ryzen 7 7800X3D CPU Drops To $299 A Day Ahead of 7700X3D's Launch, Bringing 3D V-Cache Goodness To Mainstream Gamers

MSI Charged A Customer For A Bent-Pin Repair, Then Returned The Motherboard With Another Pin Still Bent; Apologizes Later

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker