⋮    ⋮  

AMD Instinct MI200 Speculated to Utilize 110 Compute Units Per MCM GPU

Submit

Website Coelacanth's Dream located a Github commit that may signal a future configuration to the approaching AMD Aldebaran GPU-based Instinct accelerator. The new GPU, codenamed 'GFX90A," will utilize the CDNA2 architecture, a derivative of the GFX 9th Family structure (Vega structure).

AMD Instinct MI200 Could Feature Two 110 Compute Units CDNA 2 GPU Dies

There are three codes, GFX906_60, GFX908_120, and GFX90A_110, each one specific to a different source. The GFX906_60 is speculated to refer to the Instinct MI60, the GFX908_120 is the Instinct MI100, and the GFX90A_110 may be used for the newer-generation AMD accelerator. With each code, the third part refers to computational units.

Intel To Regain Significant Market Share Versus AMD In Client PC Segment Thanks to Alder Lake, But EPYC To Knockout Xeon In The Server Segment

For instance, the MI60 will utilize 60 compute units, the MI100 will use 120 units, and the last is to utilize 110 compute units. What is interesting is that the next-gen accelerator from AMD uses fewer computational units than the MI100.

Source: VideoCardz

It is stated the Aldebaran GPU will showcase 128 compute units, which does not match with the information received about the next-gen code for the new AMD accelerator. However, any GPU typically will deactivate some of the clusters, which if this is correct, would drop it down to 110 active compute units.

Considering the settings of different Shader Engine and CU, Aldebaran / MI200 is an MCM configuration with 2 GPU dies, so if the setting is symmetric for each die instead of Shader Engine, each die will have 4 SEs. It is possible to have (56 CUs), and disable each one of them to make a total of 110 CUs.

— Coelacanth’s Dream

Website VideoCardz states,

It is unclear if AMD is planning to double the FP32 core count on CDNA2 architecture, but assuming that they do, with a theoretical 1500 MHz GPU clock the accelerator would offer have a single-precision compute performance of 42.2 TFLOPS, 1.82x more than MI100. If that isn’t the case, then MI200 would have to have at least a 1650 MHz clock to reach the same FP32 throughput of 23 TFLOPs.

In the case of HPC accelerators such as MI200, the FP64 performance is far more important. According to previous leaks, MI200 is to feature full-rate FP64 performance, which means either doubling or quadrupling the performance over MI100, depending on the architecture.

AMD's MI200 is set to release before the end of 2021. It is their revolutionary multi-chip graphics processor that is constructed with two active dies and 128 gigabytes of HBM2e memory.

AMD Ryzen 5 6600H ‘Zen 3+’ APU is 47% Faster Than Its Predecessor, The 5600H & Matches The Desktop Ryzen 5 5600X Too

Here's What To Expect From AMD Instinct MI200 'CDNA 2' GPU Accelerator

Inside the AMD Instinct MI200 is an Aldebaran GPU featuring two dies, a secondary and a primary. It has two dies with each consisting of 8 shader engines for a total of 16 SE's. Each Shader Engine packs 16 CUs with full-rate FP64, packed FP32 & a 2nd Generation Matrix Engine for FP16 & BF16 operations. Each die, as such, is composed of 128 compute units or 8192 stream processors. This rounds up to a total of 220 compute units or 14,080 stream processors for the entire chip. The Aldebaran GPU is also powered by a new XGMI interconnect. Each chiplet features a VCN 2.6 engine and the main IO controller.

The block diagram of AMD's CDNA 2 powered Aldebaran GPU which will power the Instinct MI200 HPC accelerator has been visualized. (Image Credits: Locuza)

As for  DRAM, AMD has gone with an 8-channel interface consisting of 1024-bit interfaces for an 8192-bit wide bus interface. Each interface can support 2GB HBM2e DRAM modules. This should give us up to 16 GB of HBM2e memory capacity per stack and since there are eight stacks in total, the total amount of capacity would be a whopping 128 GB. That's 48 GB more than the A100 which houses 80 GB HBM2e memory. The full visualization of the Aldebaran GPU on the Instinct MI200 is available here.

AMD Radeon Instinct Accelerators 2020

Accelerator NameAMD Instinct MI300AMD Instinct MI250XAMD Instinct MI250AMD Instinct MI210AMD Instinct MI100AMD Radeon Instinct MI60AMD Radeon Instinct MI50AMD Radeon Instinct MI25AMD Radeon Instinct MI8AMD Radeon Instinct MI6
GPU ArchitectureTBA (CDNA 3)Aldebaran (CDNA 2)Aldebaran (CDNA 2)Aldebaran (CDNA 2)Arcturus (CDNA 1)Vega 20Vega 20Vega 10Fiji XTPolaris 10
GPU Process NodeAdvanced Process Node6nm6nm6nm7nm FinFET7nm FinFET7nm FinFET14nm FinFET28nm14nm FinFET
GPU Dies4 (MCM)?2 (MCM)2 (MCM)1 (MCM)1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)
GPU Cores28,160?14,08013,3126656768040963840409640962304
GPU Clock SpeedTBA1700 MHz1700 MHz~1700 MHz?~1500 MHz1800 MHz1725 MHz1500 MHz1000 MHz1237 MHz
FP16 ComputeTBA383 TOPs362 TOPs~176 TOPs185 TFLOPs29.5 TFLOPs26.5 TFLOPs24.6 TFLOPs8.2 TFLOPs5.7 TFLOPs
FP32 ComputeTBA95.7 TFLOPs90.5 TFLOPs~44 TFLOPs23.1 TFLOPs14.7 TFLOPs13.3 TFLOPs12.3 TFLOPs8.2 TFLOPs5.7 TFLOPs
FP64 ComputeTBA47.9 TFLOPs45.3 TFLOPs~22 TFLOPs11.5 TFLOPs7.4 TFLOPs6.6 TFLOPs768 GFLOPs512 GFLOPs384 GFLOPs
VRAMTBA128 GB HBM2e128 GB HBM2e64 GB HBM2e32 GB HBM232 GB HBM216 GB HBM216 GB HBM24 GB HBM116 GB GDDR5
Memory ClockTBA3.2 Gbps3.2 Gbps3.2 Gbps?1200 MHz1000 MHz1000 MHz945 MHz500 MHz1750 MHz
Memory BusTBA8192-bit8192-bit4096-bit4096-bit bus4096-bit bus4096-bit bus2048-bit bus4096-bit bus256-bit bus
Memory BandwidthTBA3.2 TB/s3.2 TB/s1.6 TB/s1.23 TB/s1 TB/s1 TB/s484 GB/s512 GB/s224 GB/s
Form FactorTBAOAMOAMDual Slot CardDual Slot, Full LengthDual Slot, Full LengthDual Slot, Full LengthDual Slot, Full LengthDual Slot, Half LengthSingle Slot, Full Length
CoolingTBAPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive Cooling
TDPTBA560W500W?300W?300W300W300W300W175W150W

Source: VideoCardz, ROCm Github, Coelacanth’s Dream

Submit