AMD Unveils Instinct MI100 CDNA GPU Accelerator, The World’s Fastest HPC GPU With Highest Double-Precision Horsepower

Hassan Mujtaba

AMD has officially announced its next-generation CDNA GPU-based Instinct MI100 accelerator which it calls the fastest HPC GPU in the world. The Instinct MI100 is designed to offer world's fastest double-precision compute capabilities and also deliver insane amounts of GPU performance for AI Deep Learning workloads.

AMD Instinct MI100 32 GB HPC Accelerator Official, The World's Fastest HPC GPU Based on 1st Gen CDNA Architecture

AMD's Instinct MI100 will be utilizing the CDNA architecture which is entirely different than the RDNA architecture that gamers will have access to later this month. The CDNA architecture has been designed specifically for the HPC segment and will be pitted against NVIDIA's Ampere A100 & similar accelerator cards.

Related StoryJason R. Wilson
AMD RDNA 3, Freesync & More Fixes Added To Driver Updates In DRM-Next For Linux 6.3 Landing Next Month

Some of the key highlights of the AMD Instinct MI100 GPU accelerator include:

  • All-New AMD CDNA Architecture- Engineered to power AMD GPUs for the exascale era and at the heart of the MI100 accelerator, the AMD CDNA architecture offers exceptional performance and power efficiency
  • Leading FP64 and FP32 Performance for HPC Workloads – Delivers industry-leading 11.5 TFLOPS peak FP64 performance and 23.1 TFLOPS peak FP32 performance, enabling scientists and researchers across the globe to accelerate discoveries in industries including life sciences, energy, finance, academics, government, defense, and more.
  • All-New Matrix Core Technology for HPC and AI – Supercharged performance for a full range of single and mixed-precision matrix operations, such as FP32, FP16, bFloat16, Int8, and Int4, engineered to boost the convergence of HPC and AI.
  • 2nd Gen AMD Infinity Fabric Technology – Instinct MI100 provides ~2x the peer-to-peer (P2P) peak I/O bandwidth over PCIe 4.0 with up to 340 GB/s of aggregate bandwidth per card with three AMD Infinity Fabric Links. In a server, MI100 GPUs can be configured with up to two fully-connected quad GPU hives, each providing up to 552 GB/s of P2P I/O bandwidth for fast data sharing.
  • Ultra-Fast HBM2 Memory– Features 32GB High-bandwidth HBM2 memory at a clock rate of 1.2 GHz and delivers an ultra-high 1.23 TB/s of memory bandwidth to support large data sets and help eliminate bottlenecks in moving data in and out of memory.
  • Support for Industry’s Latest PCIe Gen 4.0 – Designed with the latest PCIe Gen 4.0 technology support providing up to 64GB/s peak theoretical transport data bandwidth from CPU to GPU.

AMD Instinct MI100 'CDNA GPU' Specifications - 7680 Cores & 32 GB HBM2 VRAM

The AMD Instinct MI100 features the 7nm CDNA GPU which packs a total of 120 Compute Units or 7680 stream processors. Internally referred to as Arcturus, the CDNA GPU powering the instinct MI100 is said to measure at around 720mm2. The GPU has a clock speed configured around 1500 MHz and delivers a peak performance throughput of 11.5 TFLOPs in FP64, 23.1 TFLOPs in FP32, and a massive 185 TFLOPs in FP16 compute workloads. The accelerator will feature a total power draw of 300 Watts.

For memory, AMD will be equipping its Radeon Instinct MI100 HPC accelerator with a total of 32 GB HBM2 memory. The graphics card will be configured in 4 and 8 GPU configurations & communicate through the new Infinity Fabric X16 interconnect which has a rated bandwidth of 276 GB/s. AMD is using HBM2 chip rated to provide an effective bandwidth of 1.23 GB/s while NVIDIA's A100 features HBM2e memory dies with 1.536 GB/s bandwidth.

AMD Radeon Instinct Accelerators 2020

Accelerator NameAMD Instinct MI300AMD Instinct MI250XAMD Instinct MI250AMD Instinct MI210AMD Instinct MI100AMD Radeon Instinct MI60AMD Radeon Instinct MI50AMD Radeon Instinct MI25AMD Radeon Instinct MI8AMD Radeon Instinct MI6
CPU ArchitectureZen 4 (Exascale APU)N/AN/AN/AN/AN/AN/AN/AN/AN/A
GPU ArchitectureTBA (CDNA 3)Aldebaran (CDNA 2)Aldebaran (CDNA 2)Aldebaran (CDNA 2)Arcturus (CDNA 1)Vega 20Vega 20Vega 10Fiji XTPolaris 10
GPU Process Node5nm+6nm6nm6nm6nm7nm FinFET7nm FinFET7nm FinFET14nm FinFET28nm14nm FinFET
GPU Chiplets4 (MCM / 3D Stacked)
1 (Per Die)
2 (MCM)
1 (Per Die)
2 (MCM)
1 (Per Die)
2 (MCM)
1 (Per Die)
1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)
GPU Cores28,160?14,08013,3126656768040963840409640962304
GPU Clock SpeedTBA1700 MHz1700 MHz1700 MHz1500 MHz1800 MHz1725 MHz1500 MHz1000 MHz1237 MHz
FP16 ComputeTBA383 TOPs362 TOPs181 TOPs185 TFLOPs29.5 TFLOPs26.5 TFLOPs24.6 TFLOPs8.2 TFLOPs5.7 TFLOPs
FP32 ComputeTBA95.7 TFLOPs90.5 TFLOPs45.3 TFLOPs23.1 TFLOPs14.7 TFLOPs13.3 TFLOPs12.3 TFLOPs8.2 TFLOPs5.7 TFLOPs
FP64 ComputeTBA47.9 TFLOPs45.3 TFLOPs22.6 TFLOPs11.5 TFLOPs7.4 TFLOPs6.6 TFLOPs768 GFLOPs512 GFLOPs384 GFLOPs
VRAM192 GB HBM3?128 GB HBM2e128 GB HBM2e64 GB HBM2e32 GB HBM232 GB HBM216 GB HBM216 GB HBM24 GB HBM116 GB GDDR5
Memory ClockTBA3.2 Gbps3.2 Gbps3.2 Gbps1200 MHz1000 MHz1000 MHz945 MHz500 MHz1750 MHz
Memory Bus8192-bit8192-bit8192-bit4096-bit4096-bit bus4096-bit bus4096-bit bus2048-bit bus4096-bit bus256-bit bus
Memory BandwidthTBA3.2 TB/s3.2 TB/s1.6 TB/s1.23 TB/s1 TB/s1 TB/s484 GB/s512 GB/s224 GB/s
Form FactorOAMOAMOAMDual Slot CardDual Slot, Full LengthDual Slot, Full LengthDual Slot, Full LengthDual Slot, Full LengthDual Slot, Half LengthSingle Slot, Full Length
CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive Cooling

AMD's Instinct MI100 'CDNA GPU' Performance Numbers, An FP32 & FP64 Compute Powerhouse

In terms of performance, the AMD Instinct MI100 was compared to the NVIDIA Volta V100 and the NVIDIA Ampere A100 GPU accelerators. Comparing the numbers, the Instinct MI100 offers a 19.5% uplift in FP64 and an 18.5% uplift in FP32 performance. In FP16 performance, the NVIDIA A100 has a 69% advantage over the Instinct MI100.


Surprisingly, NVIDIA's numbers with Sparsity are still higher than what AMD can crunch. It's 19.5 TFLOPs vs 11.5 TFLOPs in FP64, 156 TFLOPs vs 23.1 TFLOPs in FP32, and 624 TFLOPs vs 185 TFLOPs in FP16.

AMD Instinct MI100 vs NVIDIA's Ampere A100 HPC Accelerator

In terms of actual workload performance, the AMD Instinct MI100 offers a 2.1x perf/$ ratio in FP64 and FP32 workloads. Once again, these are numbers comparing the non-sparsity performance that the NVIDIA A100 has to offer. AMD also provides some performance statistics in various HPC work- loads such as NAMD, CHOLLA, and GESTS where it is up to 3x/1.4x/2.6x faster than the Volta-based V100 GPU accelerator. Compared to the MI60, the Instinct MI100 offers a 40% improvement in the PIConGPU workload.


According to AMD, its Instinct MI100 GPU accelerator will be available through OEMs and ODMs and integrated with the first systems by the end of 2020. The systems will be packing AMD's EPYC CPUs and Instinct accelerators. Some partners include HPE, Dell, Supermicro, and Gigabyte who already have servers based on the new Instinct MI100 accelerator ready to ship out. There was no word regarding the price of this particular accelerator.

Share this story

Deal of the Day