AMD has more Instinct MI200 series cards on the way for the HPC segment based on its brand new Aldebaran CDNA 2 GPU architecture. The latest card that's being talked about is the Instinct MI210 which features a single graphics compute die.

AMD Instinct MI210 To Feature A Single Aldebaran 'CDNA 2' GPU Compute Die With 6656 Cores & 64 GB HBM2E Memory

With the Instinct MI250X and MI250, AMD brought MCM technology to the data center and HPC segment. Based on its new CDNA 2 architecture, the new Aldebaran GPU offers immense power aimed at HPC and Data Center workloads. But there are more MI200 series cards on the horizon and the MI210 is one of them.

Playing with one @AMDInstinct MI210, BabelStream with HIP is around 40% more than MI100, seems great. #HPC #GPU — George Markomanolis (@geomark) December 3, 2021

Yes, 104 CUs, 64 GB HBM2e — George Markomanolis (@geomark) December 3, 2021

Oh thanks! It is close to 1.4 TB/s for all the kernels. — George Markomanolis (@geomark) December 3, 2021

Unveiled by George Markomanolis, an Engineer working on the upcoming LUMI supercomputer & lead HPC scientist at CSC, who got remote access to the AMD Instinct MI210 boasts some impressive specs out of the box. George has shared that the Instinct MI210 features a single GCD which means it is a completely new SKU and doesn't feature both GCD dies on board the package. The single GCD is equipped with 104 CUs out of the 128 CUs featured on the Aldebaran chip. Even the higher-end MI250X features just 110 CUs enabled per die for a total of 7040 stream processors. The MI210 is housing 6656 stream processors.

In addition to the core count, the AMD Instinct MI210 also rocks 64 GB of HBM2e memory which is half the amount of the Instinct MI250X but twice the memory capacity over the Instinct MI100 and that was the flagship just a few months ago until it got replaced by the MI250 series. We don't have the exact Flops for this card but assuming it is clocked around the same 1700 MHz as the Instinct Mi250 accelerators are, we are looking at around 22-23 TFLOPs of FP64 and 44-46 TFLOPs of FP32 compute. This should give some heated competition to the NVIDIA A100 which isn't expected to get an update till GTC next year.

George has also shared that the AMD Instinct MI210 is around 40% faster than the Instinct MI100 in BabelStream with HIP. Given the cut-down specifications, we can expect the TDP to fall around 300-350W. And since this is a 1 GCD accelerator, we are also expecting to see a 4096-bit bus interface at 3.2 Gbps pin speeds for a total of 1.6 TB/s bandwidth. The MI210 accelerator should launch in both OAM and PCIe form factors and will start shipping to priority HPC customers and partners soon.

AMD Radeon Instinct Accelerators 2020

Accelerator Name AMD Instinct MI300 AMD Instinct MI250X AMD Instinct MI250 AMD Instinct MI210 AMD Instinct MI100 AMD Radeon Instinct MI60 AMD Radeon Instinct MI50 AMD Radeon Instinct MI25 AMD Radeon Instinct MI8 AMD Radeon Instinct MI6 GPU Architecture TBA (CDNA 3) Aldebaran (CDNA 2) Aldebaran (CDNA 2) Aldebaran (CDNA 2) Arcturus (CDNA 1) Vega 20 Vega 20 Vega 10 Fiji XT Polaris 10 GPU Process Node Advanced Process Node 6nm 6nm 6nm 7nm FinFET 7nm FinFET 7nm FinFET 14nm FinFET 28nm 14nm FinFET GPU Dies 4 (MCM)? 2 (MCM) 2 (MCM) 2 (MCM) 1 (Monolithic) 1 (Monolithic) 1 (Monolithic) 1 (Monolithic) 1 (Monolithic) 1 (Monolithic) GPU Cores 28,160? 14,080 13,312 TBA 7680 4096 3840 4096 4096 2304 GPU Clock Speed TBA 1700 MHz 1700 MHz TBA ~1500 MHz 1800 MHz 1725 MHz 1500 MHz 1000 MHz 1237 MHz FP16 Compute TBA 383 TOPs 362 TOPs TBA 185 TFLOPs 29.5 TFLOPs 26.5 TFLOPs 24.6 TFLOPs 8.2 TFLOPs 5.7 TFLOPs FP32 Compute TBA 95.7 TFLOPs 90.5 TFLOPs TBA 23.1 TFLOPs 14.7 TFLOPs 13.3 TFLOPs 12.3 TFLOPs 8.2 TFLOPs 5.7 TFLOPs FP64 Compute TBA 47.9 TFLOPs 45.3 TFLOPs TBA 11.5 TFLOPs 7.4 TFLOPs 6.6 TFLOPs 768 GFLOPs 512 GFLOPs 384 GFLOPs VRAM TBA 128 GB HBM2e 128 GB HBM2e TBA 32 GB HBM2 32 GB HBM2 16 GB HBM2 16 GB HBM2 4 GB HBM1 16 GB GDDR5 Memory Clock TBA 3.2 Gbps 3.2 Gbps TBA 1200 MHz 1000 MHz 1000 MHz 945 MHz 500 MHz 1750 MHz Memory Bus TBA 8192-bit 8192-bit 8192-bit 4096-bit bus 4096-bit bus 4096-bit bus 2048-bit bus 4096-bit bus 256-bit bus Memory Bandwidth TBA 3.2 TB/s 3.2 TB/s TBA 1.23 TB/s 1 TB/s 1 TB/s 484 GB/s 512 GB/s 224 GB/s Form Factor TBA OAM OAM Dual Slot Card Dual Slot, Full Length Dual Slot, Full Length Dual Slot, Full Length Dual Slot, Full Length Dual Slot, Half Length Single Slot, Full Length Cooling TBA Passive Cooling Passive Cooling Passive Cooling Passive Cooling Passive Cooling Passive Cooling Passive Cooling Passive Cooling Passive Cooling TDP TBA 560W 500W? TBA 300W 300W 300W 300W 175W 150W

