NVIDIA's Blackwell Ultra GB300 & AMD's Instinct MI355X have finally appeared in the latest MLPerf v3.1 AI inference benchmarks.
NVIDIA Blackwell Ultra GB300 & AMD Instinct MI355X Rock MLPerf v5.1 Benchmarks With Unmatched AI Performance
Today, MLCommons published its latest round of MLPerf v5.1 AI Inference benchmarks, and there were a few submissions that stole the spotlight. These are NVIDIA's Blackwell Ultra GB300, AMD's Instinct MI355X, and the Intel Arc Pro B60 (we've covered more about this here). The GB300 and MI355X are the fastest AI options from each vendor; as such, there is a lot of attention focused on these products, so let's see what these chips have on offer in MLPerf.
Starting with DeepSeek R1 (Offline), NVIDIA's GB300 simply crushes it, offering a 45% gain over GB200 in a 72 GPU comparison, while the 8 GPU comparison shows a 44% uplift. This is about as much as NVIDIA had promised (50% gain with Blackwell Ultra).
MLPerf v5.1 (Deepseek R1 Offline)
Next up, we have the DeepSeek R1 (Server) comparison, which shows a 25% gain in the 72-GPU and a 21% gain in the 8-GPU submissions.
MLPerf v5.1 (Deepseek R1 Server)
Moving over to Llama 3.1 405B (Offline), here, we get to see our first AMD Instinct MI355X comparison, but against the GB200, since no submission was made for an 8-GPU configuration in this benchmark. It looks like the new AMD Instinct platform brings a solid 27% increase here.
MLPerf v5.1 (llama 3.1 405b Offline)
Then we have the Llama 2 70B (Offline) comparison, which shows the Instinct MI355X offering up to 648248 token generation per second with a 64-chip configuration, 350820 token generation per second with a 32-chip configuration, and 65770 token generation per second with an 8-chip configuration. This is a massive 2.09x increase over the NVIDIA GB200 (x8) configuration. The Arc Pro B60 scores 3009 tokens/s here, but its value proposition is much better than pure datacenter/HPC chips such as Blackwell Ultra and AMD Instinct series.
MLPerf v5.1 (llama2-70B-99.9 Offline "Open Division")
With that said, NVIDIA has shared the full spectrum of benchmarks and the various records that they achieved using their new Blackwell Ultra GB300 platform.
Following is their full record table:
| MLPerf Inference Per-Accelerator Records | |||
| Benchmark | Offline | Server | Interactive |
| DeepSeek-R1 | 5,842 tokens/second/GPU | 2,907 tokens/second/GPU | ** |
| Llama 3.1 405B | 224 tokens/second/GPU | 170 tokens/second/GPU | 138 tokens/second/GPU |
| Llama 2 70B 99.9% | 12,934 tokens/second/GPU | 12,701 tokens/second/GPU | 7,856 tokens/second/GPU |
| Llama 2 70B 99% | 13,015 tokens/second/GPU | 12,701 tokens/second/GPU | 7,856 tokens/second/GPU |
| Llama 3.1 8B | 18,370 tokens/second/GPU | 16,099 tokens/second/GPU | 15,284 tokens/second/GPU |
| Stable Diffusion XL | 4.07 samples/second/GPU | 3.59 queries/second/GPU | ** |
| Mixtral 8x7B | 16,099 tokens/second/GPU | 16,131 tokens/second/GPU | ** |
| DLRMv2 99% | 87,228 samples/second/GPU | 80,515 samples/second/GPU | ** |
| DLRMv2 99.9% | 48,666 samples/second/GPU | 46,259 queries/second/GPU | ** |
| Whisper | 5,667 tokens/second/GPU | ** | ** |
| R-GAT | 81,404 samples/second/GPU | ** | ** |
| Retinanet | 1,875 samples/second/GPU | 1,801 queries/second/GPU | ** |
Not only that, but NVIDIA's Blackwell Ultra Lao sets reasoning records in its debut at MLPerf. The Blackwell Ultra GPUs score a 4.7x lead in offline and a 5.2x lead in server comparisons against the Hopper platform.
| DeepSeek-R1 Performance | ||
| Architecture | Offline | Server |
| Hopper | 1,253 tokens/second/GPU | 556 tokens/second/GPU |
| Blackwell Ultra | 5,842 tokens/second/GPU | 2,907 tokens/second/GPU |
| Blackwell Ultra Advantage | 4.7x | 5.2x |
In the next round of MLPerf submissions, we can expect NVIDIA, AMD, and Intel further optimize their existing platforms for better performance, resulting in higher scores within these benchmarks.
Follow Wccftech on Google to get more of our news coverage in your feeds.
