AMD has shared more details about its Radeon AI PRO R9700 GPU and how it compares in AI tasks against the existing Radeon PRO W7800.
AMD Offers 4x More AI TOPS & 2x AI Performance Uplift To Consumers With Its RDNA 4-Based Radeon AI PRO R9700 GPU
AMD recently expanded its software suite with ROCm 7, with its current AI accelerator strategy scaling across three main categories: the Ryzen AI MAX APUs, aiming at Small-Medium LLMs, the Radeon AI PRO GPUs, aiming the Multi-GPU Edge inference & small-medium LLMs, and lastly, the Instinct AI accelerators, aiming at Large LLMs for Rack Scale Inference, and training. While AMD has detailed its MI350 series, the company also revealed some more AI statistics for its Radeon AI PRO lineup.
The AMD Radeon AI PRO R9700 uses the Navi 48 GPU, which comes with 64 compute units or 4096 stream processors. The GPU is loaded with 128 AI accelerators and has a TBP of up to 300W. In terms of memory, the AMD Radeon AI PRO R9700 is equipped with 32 GB of GDDR6 memory, running across a 256-bit bus, and this essentially doubles the VRAM featured on the 9070 XT. Other performance aspects being shared by AMD include the 96 TFLOPs of FP16 compute and 1531 TOPS INT4 (Sparse).

The goal of the AMD Radeon AI PRO R9700 GPU is to enable high-quality AI models to be completed efficiently. That's why it has been equipped with 32 GB of VRAM, which is an optimal amount for most advanced Local AI workloads, such as DeepSeek R1 Distill Qwen 32B Q6, Mistral Small 3.1 24B Instruct 2503 Q8, Flux 1 Schnel, and SD 3.5 Medium.

As for performance, AMD states that the Radeon AI PRO R9700 is twice as fast as the Radeon PRO W7800 32 GB GPU in DeepSeek R1, while the company also shows a few measurements against the RTX 5080, which features a 16 GB VRAM buffer. The 16 GB of VRAM might not be suitable for AI models that require more memory, and that's why the R9700 is being shown to be up to 5x faster.

Diving into the compute metrics, the R9700 will offer 47.8 TFLOPs of FP32, 191.4 TFLOPs of FP16 / BF16, 382.7 TFLOPs of FP8, 382.7 TOPs of INT8, and 765.5 TOPS of INT4 performance. The GPU will also support Wave Matrix Multiply Accumulate (WMMA) instructions and Structured Sparsity. With Sparsity, the total INT4 TOPS will reach 1531. The FP16 figures are a 2x increase over the Radeon PRO W7800, while the INT8/INT4 figures see a 4x uplift.

AMD also highlights why having support for larger models is essential for obtaining better results. In Text-To-Image, an 8B model run on FP16 will produce far better results than a 1B model. As for reasoning, a 32B 6-bit model will produce higher accuracy than an 8B 6-bit model.
AMD also showcases the FP16 AI performance in a single GPU comparison between the R9700 & the W7800, with the new variant offering over 2x performance uplift in DeepSeek R1 Distill Llama (8B).

But it doesn't end here; the AMD Radeon AI PRO R9700 can also be scaled in 4-way Multi-GPU configurations using a modern-day PCIe 5.0 platform. This enables users to harness a massive 128 GB pool, which can handle buffer models such as Mistral 123B & DeepSeek R1 70 B. These models can consume up to 112-116 GB of VRAM.
Lastly, for availability, the AMD Radeon AI PRO R9700 GPU will be available in July this year through leading partners such as ASUS, ASRock, Gigabyte, PowerColor, Sapphire, XFX, and Yeston. The card is going to be a dual-slot design with a blower cooler.

You can check out the ASRock model below:
AMD Radeon Pro Workstation Graphics Lineup:
| Graphics Card Name | Radeon AI PRO R9700 | Radeon Pro W7900 | Radeon Pro W7800 | Radeon Pro W6900X | Radeon Pro W6800 | Radeon Pro VII | Radeon Pro W5700X | Radeon Pro W5700 | Radeon Pro WX 9100 | Radeon Pro WX 8200 | Radeon Pro WX 7100 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| GPU | Navi 48 | Navi 31 | Navi 31 | Navi 21 | Navi 21 | Vega 20 | Navi 10 | Navi 10 | Vega 10 | Vega 10 | Polaris 10 |
| Process Node | 4nm | 5nm+6nm | 5nm+6nm | 7nm | 7nm | 7nm | 7nm | 7nm | 14nm | 14nm | 14nm |
| Compute Units | 64 CU | 96 CU | 70 CU | 80 | 60 | 60 | 40 | 36 | 64 | 56 | 36 |
| Stream Processors | 4096 | 6144 | 4480 | 5120 | 3840 | 3840 | 2560 | 2304 | 4096 | 3584 | 2304 |
| Clock Speed (Peak) | TBD | ~2.5 GHz | ~2.5 GHz | 2171 MHz | 2320 MHz | 1700 MHz | 2040 MHz | 1930 MHz | 1500 MHz | 1500 MHz | 1243 MHz |
| VRAM | 32 GB GDDR6 | 48 GB GDDR6 | 32 GB GDDR6 | 32 GB GDDR6 | 32 GB GDDR6 | 16 GB HBM2 | 16 GB GDDR6 | 8 GB GDDR6 | 16 GB HBM2 | 8 GB HBM2 | 8 GB GDDR5 |
| Memory Bandwidth | 640 GB/s | 864 GB/s | 576 GB/s | 512 GB/s | 512 GB/s | 1024 GB/s | 448 GB/s | 448 GB/s | 512 GB/s | 484 GB/s | 224 GB/s |
| Memory Bus | 256-bit | 384-bit | 256-bit | 256-bit | 256-bit | 4096-bit | 256-bit | 256-bit | 2048-bit | 2048-bit | 256-bit |
| Compute Rate (FP32) | 48 TFLOPs | 61.3 TFLOPs | 45.2 TFLOPs | 22.23 TFLOPs | 17.82 TFLOPs | 13.1 TFLOPs | 9.5 TFLOPs | 8.89 TFLOPs | 12.3 TFLOPs | 10.8 TFLOPs | 5.7 TFLOPs |
| TDP | 300W | 295W | 260W | 300W | 250W | 250W | 240W | 205W | 250W | 230W | 150W |
| Price | TBD | $3999 US | $2499 US | $5999 US | $2249 US | $1899 US | $999 US | $799 US | $2199 US | $999 US | $799 US |
| Launch | 2025 | 2023 | 2023 | 2021 | 2021 | 2020 | 2019 | 2019 | 2017 | 2018 | 2016 |
Follow Wccftech on Google to get more of our news coverage in your feeds.
















