MLPerf v5.1 AI Inference Benchmark Showdown: NVIDIA Blackwell Ultra GB300 & AMD Instinct MI355X In The Spotlight

Sep 9, 2025 at 04:00pm EDT
MLPerf v5.1 AI Inference Benchmark Showdown: NVIDIA Blackwell Ultra GB300 & AMD Instinct MI355X In The Spotlight 1

NVIDIA's Blackwell Ultra GB300 & AMD's Instinct MI355X have finally appeared in the latest MLPerf v3.1 AI inference benchmarks.

NVIDIA Blackwell Ultra GB300 & AMD Instinct MI355X Rock MLPerf v5.1 Benchmarks With Unmatched AI Performance

Today, MLCommons published its latest round of MLPerf v5.1 AI Inference benchmarks, and there were a few submissions that stole the spotlight. These are NVIDIA's Blackwell Ultra GB300, AMD's Instinct MI355X, and the Intel Arc Pro B60 (we've covered more about this here). The GB300 and MI355X are the fastest AI options from each vendor; as such, there is a lot of attention focused on these products, so let's see what these chips have on offer in MLPerf.

Related Story AMD Reportedly Says No To FSR 4 For RDNA 3.5, Stripping Ryzen AI 300/400 APUs Of Latest Upscaling Technology

Starting with DeepSeek R1 (Offline), NVIDIA's GB300 simply crushes it, offering a 45% gain over GB200 in a 72 GPU comparison, while the 8 GPU comparison shows a 44% uplift. This is about as much as NVIDIA had promised (50% gain with Blackwell Ultra).

MLPerf v5.1 (Deepseek R1 Offline)
Samples/s
0
70095
140190
210285
280380
350475
420570
0
70095
140190
210285
280380
350475
420570
GB300 (x72)
420569
GB200 (x72)
289712
GB300 (x8)
48047
GB200 (x8)
33379

Next up, we have the DeepSeek R1 (Server) comparison, which shows a 25% gain in the 72-GPU and a 21% gain in the 8-GPU submissions.

MLPerf v5.1 (Deepseek R1 Server)
Queries/s
0
34888
69776
104664
139552
174440
209328
0
34888
69776
104664
139552
174440
209328
GB300 (x72)
209328
GB200 (x72)
167578
GB300 (x8)
22545
GB200 (x8)
18592

Moving over to Llama 3.1 405B (Offline), here, we get to see our first AMD Instinct MI355X comparison, but against the GB200, since no submission was made for an 8-GPU configuration in this benchmark. It looks like the new AMD Instinct platform brings a solid 27% increase here.

MLPerf v5.1 (llama 3.1 405b Offline)
Tokens/s
0
4000
8000
12000
16000
20000
24000
0
4000
8000
12000
16000
20000
24000
GB300 (x72)
16104
GB200 (x72)
14774
MI355X (8x)
2109
GB200 (8x)
1660

Then we have the Llama 2 70B (Offline) comparison, which shows the Instinct MI355X offering up to 648248 token generation per second with a 64-chip configuration, 350820 token generation per second with a 32-chip configuration, and 65770 token generation per second with an 8-chip configuration. This is a massive 2.09x increase over the NVIDIA GB200 (x8) configuration. The Arc Pro B60 scores 3009 tokens/s here, but its value proposition is much better than pure datacenter/HPC chips such as Blackwell Ultra and AMD Instinct series.

MLPerf v5.1 (llama2-70B-99.9 Offline "Open Division")
Tokens/s
0
108042
216084
324126
432168
540210
648252
0
108042
216084
324126
432168
540210
648252
MI355X (64x)
648248
MI355X (32x)
350820
MI355X (x8)
93045
B200 (x8)
65770
H200 (x8)
31383
MI300X (x16)
27185
Maxsun Arc Pro B60 (x4)
3009

With that said, NVIDIA has shared the full spectrum of benchmarks and the various records that they achieved using their new Blackwell Ultra GB300 platform.

Following is their full record table:

MLPerf Inference Per-Accelerator Records
BenchmarkOfflineServerInteractive
DeepSeek-R15,842 tokens/second/GPU2,907 tokens/second/GPU**
Llama 3.1 405B224 tokens/second/GPU170 tokens/second/GPU138 tokens/second/GPU
Llama 2 70B 99.9%12,934 tokens/second/GPU12,701 tokens/second/GPU7,856 tokens/second/GPU
Llama 2 70B 99%13,015 tokens/second/GPU12,701 tokens/second/GPU7,856 tokens/second/GPU
Llama 3.1 8B18,370 tokens/second/GPU16,099 tokens/second/GPU15,284 tokens/second/GPU
Stable Diffusion XL4.07 samples/second/GPU3.59 queries/second/GPU**
Mixtral 8x7B16,099 tokens/second/GPU16,131 tokens/second/GPU**
DLRMv2 99%87,228 samples/second/GPU80,515 samples/second/GPU**
DLRMv2 99.9%48,666 samples/second/GPU46,259 queries/second/GPU**
Whisper5,667 tokens/second/GPU****
R-GAT81,404 samples/second/GPU****
Retinanet1,875 samples/second/GPU1,801 queries/second/GPU**

Not only that, but NVIDIA's Blackwell Ultra Lao sets reasoning records in its debut at MLPerf. The Blackwell Ultra GPUs score a 4.7x lead in offline and a 5.2x lead in server comparisons against the Hopper platform.

DeepSeek-R1 Performance
ArchitectureOfflineServer
Hopper1,253 tokens/second/GPU556 tokens/second/GPU
Blackwell Ultra5,842 tokens/second/GPU2,907 tokens/second/GPU
Blackwell Ultra Advantage4.7x5.2x

In the next round of MLPerf submissions, we can expect NVIDIA, AMD, and Intel further optimize their existing platforms for better performance, resulting in higher scores within these benchmarks.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.