AMD Radeon Instinct MI25 Accelerator With 16 GB HBM2 Specifications Detailed – Launches Today Along With Instinct MI8 and Instinct MI6

Hassan Mujtaba

AMD has officially launched their Vega GPU based Instinct MI25 accelerator for large-scale machine intelligence and deep learning data center applications. The new graphics card deploys some of the latest Radeon technologies which boost performance and deliver much higher compute throughput in AI learning tasks.

AMD Radeon Instinct Lineup Launched - Vega Based Instinct MI25, Fiji Based Instinct MI8 and Polaris Based Instinct MI6 Graphics Accelerators

AMD is launching three new graphics accelerators today which are part of the Radeon Instinct line up. These include the Vega 10 based Radeon Instinct MI25, the Fiji XT based Radeon Instinct MI8 and the Polaris 10 based Radeon Instinct MI6. The "MI" in the Instinct family branding stands for "Machine Intelligence" while the corresponding number is the total half precision compute output of the card itself.

Related StoryHassan Mujtaba
Custom AMD Radeon RX 7900 XTX & RX 7900 XT Graphics Cards To Cost Up To $1600 US In China

Through our Radeon Instinct server accelerator products and open ecosystem approach, we’re able to offer our customers cost-effective machine and deep learning training, edge-training and inference solutions, where workloads can take the most advantage of the GPU’s highly parallel computing capabilities.

We’ve also designed the three initial Radeon Instinct accelerators to address a wide range of machine intelligence applications, which includes data-centric HPC-class systems in academics, government labs, energy, life science, financial, automotive and other industries via Radeon

AMD Radeon Instinct MI25 Accelerator With Vega GPU (24.6 TFLOPs FP16) and 16 GB HBM2

The AMD Radeon Instinct MI25 accelerator is the fastest of the Instinct lineup. It features the Vega 10 graphics core with 4096 stream processors that are clocked at 1500 MHz. With these clock rates, the card delivers 24.6 TFLOPs of FP16, 12.3 TFLOPs of FP32 and 768 GFLOPs of FP64 compute that is aimed at deep learning tasks. The card also packs 16 GB of ECC HBM2 memory which delivers a total of 484 GB/s bandwidth.

It should be noted that the card is slightly lower clocked compared to the Vega Frontier Edition which packs a 1600 MHz clock rate and delivers 13 TFLOPs of FP32, 25 TFLOP of FP16 compute. AMD has said that the card delivers up to 82 GFLOPs/Watt FP16 and 41 GFLOPs/Watt FP32 peak GPU compute performance.


  • Industry Leading Performance for Deep Learning
  • Next-Gen “Vega” Architecture
  • Advanced Memory Engine
  • Large BAR Support for Multi-GPU Peer to Peer
  • ROCm Open Software Platform for Rack Scale
  • Optimized MIOpen Libraries for Deep Learning
  • MxGPU Hardware Virtualization

The Radeon Instinct MI25 accelerator, based on the new “Vega” GPU architecture with a 14nm FinFET process, will be the world’s ultimate training accelerator for large-scale machine intelligence and deep learning datacenter applications. The MI25 will deliver superior FP16 and FP32 performance in a passively-cooled single GPU server card with 24.6 TFLOPS of FP16 or 12.3 TFLOPS of FP32 peak performance through its 64 compute units (4,096 stream processors). With 16GB of ultra–high bandwidth HBM2 ECC GPU memory and up to 484 GB/s of memory bandwidth, the Radeon Instinct MI25’s design is optimized for massively parallel applications with large datasets for Machine Intelligence and HPC-class systems. via AMD

In addition to the specifications, the card comes in a dual slot, full height form factor. It requires dual 8-pin connectors to power and the TDP is rated at 300W. The card is passively cooled so it's going to receive cooling from air inside large server racks. The card ships with a three year limited warranty.

AMD Radeon Instinct MI8 Accelerator With Fiji GPU (8.20 TFLOPs FP16) and 4 GB HBM1

AMD is also launching the Radeon Instinct MI8 accelerator which is designed as an inference card. The Instinct MI8 comes packed with the Fiji XT GPU that is based on the 28nm process. The GPU is housing the same number of cores as the Instinct MI25 which are 4096 in total but they are based on the older GCN revision and clocked much slower.


  • 8.2 TFLOPS FP16 or FP32 Performance
  • Up To 47 GFLOPS Per Watt FP16 or FP32 Performance
  • 4GB HBM1 on 512-bit Memory Interface
  • Passively Cooled Server Accelerator
  • Large BAR Support for Multi GPU Peer to Peer
  • ROCm Open Platform for HPC-Class Rack Scale
  • Optimized MIOpen Libraries for Deep Learning
  • MxGPU SR-IOV Hardware Virtualization

The Radeon Instinct MI8 accelerator, harnessing the high-performance, energy-efficiency of the “Fiji” GPU architecture, is a small form factor HPC and inference accelerator with 8.2 TFLOPS of peak FP16|FP32 performance at less than 175W board power and 4GB of High-Bandwidth Memory (HBM) on a 512-bit memory interface. The MI8 is well suited for machine learning inference and HPC applications. via AMD

In terms of specifications, the card features 4096 stream processors that are clocked at 1000 MHz. This delivers a rated compute output of 8.2 TFLOPs (FP16 / FP32) and 512 GFLOPs of FP64 compute at 1/16th rate. The card also features 4 GB of HBM1 memory which delivers 512 GB/s bandwidth. It is slightly faster than the Vega based Instinct MI25 accelerator but requires two more stacks and is more power hungry. AMD is rating the compute output of this card at 47 GFLOPs/Watt of FP16 and FP32 compute while FP64 compute is rated at 2.9 GFLOPs/Watt.

The card comes in the same small, dual slot package as the Radeon R9 Nano. It has a rated TDP of 175W and power is provided through a single 8-pin connector. The card also lacks active cooling since it's aimed at servers.

AMD Radeon Instinct MI6 Accelerator With Polaris  GPU (5.70 TFLOPs FP16) and 16 GB GDDR5

Lastly, we have the AMD Radeon Instinct MI6 graphics accelerator. This card packs the Polaris 10 core and is aimed at both Deep Learning and Inferencing workloads. In terms of specifications, the chip packs the complete 2304 stream processors. All cores are clocked at 1237 MHz. At the rated clock speeds, the chip delivers 5.7 TFLOPs (FP16 / FP32) compute and 358 GFLOPs of dual precision compute performance.

AMD has rated the single and half precision throughput of this card at 2.4 GFLOPs/Watt while the dual precision compute throughput is rated at 358 GFLOPs/Watt.


  • 5.7 TFLOPS FP16 or FP32 Performance
  • Up To 38 GFLOPS Per Watt Peak FP16 or FP32 Performance
  • 16GB Ultra-Fast GDDR5 Memory on 256-bit Memory Interface
  • Passively Cooled Server Accelerator
  • Large BAR Support for Multi-GPU Peer to Peer
  • ROCm Open Platform for HPC-Class Scale Out
  • Optimized MIOpen Libraries for Deep Learning
  • MxGPU SR-IOV Hardware Virtualization

The Radeon Instinct MI6 accelerator, based on the acclaimed “Polaris” GPU architecture, is a passively cooled inference accelerator with 5.7 TFLOPS of peak FP16|FP32 performance at 150W board power and 16GB of ultra-fast GDDR5 GPU memory on a 256-bit memory interface. The MI6 is a versatile accelerator ideal for HPC and machine learning inference and edge-training deployments. via AMD

The card also comes with 16 GB of GDDR5 memory clocked at 7000 MHz along a 256-bit wide bus interface. This delivers up to 224 GB/s of bandwidth on the card. The card comes in a single slot, full length form factor and is passive cooled with air coming in from the large server arrays. TDP on the card is set at 150W so power is provided by a single 6-pin connector.

AMD Radeon Instinct Accelerators:

Accelerator NameAMD Radeon Instinct MI6AMD Radeon Instinct MI8AMD Radeon Instinct MI25AMD Radeon Instinct MI60AMD Radeon Instinct MI60
GPU ArchitecturePolaris 10Fiji XTVega 10Vega 20Vega 20
GPU Process Node14nm FinFET28nm14nm FinFET7nm FinFET7nm FinFET
GPU Cores23044096409638404096
GPU Clock Speed1237 MHz1000 MHz1500 MHz1746 MHz1800 MHz
FP16 Compute5.7 TFLOPs8.2 TFLOPs24.6 TFLOPs26.8 TFLOPs29.6 TFLOPs
FP32 Compute5.7 TFLOPs8.2 TFLOPs12.3 TFLOPs13.4 TFLOPs14.8 TFLOPs
FP64 Compute384 GFLOPs512 GFLOPs768 GFLOPs6.7 TFLOPs7.4 TFLOPs
Memory Clock1750 MHz500 MHz472 MHz500 MHz500 MHz
Memory Bus256-bit bus4096-bit bus2048-bit bus4096-bit bus4096-bit bus
Memory Bandwidth224 GB/s512 GB/s484 GB/s1 TB/s1 TB/s
Form FactorSingle Slot, Full LengthDual Slot, Half LengthDual Slot, Full LengthDual Slot, Full LengthDual Slot, Full Length
CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive Cooling

Planned for June 29th rollout, the ROCm 1.6 software platform with performance improvements and now support for MIOpen 1.0 is scalable and fully open source providing a flexible, powerful heterogeneous compute solution for a new class of hybrid Hyperscale and HPC-class systems.

Comprised of an open-source Linux driver optimized for scalable multi-GPU computing, the ROCm software platform provides multiple programming models, the HIP CUDA conversion tool, and support for GPU acceleration using the Heterogeneous Computing Compiler (HCC). AMD also showcased several server racks from their partners that utilized the new EPYC 7000 series processors and Instinct MI25 accelerators.

Share this story