It looks like AMD's next-gen Instinct MI300 GPU accelerator has made a possible first appearance in the latest Linux patch.
AMD Instinct MI300 'GFX940' GPU, Next-Gen Data Center MCM Accelerator, Makes Possible First Appearance In Linux Patch
The latest Linux Patch has included a new target for an unreleased AMD 'GFX940' GP which has a similar ISA as the Aldebaran 'GFX90a' GPU. It is speculated that this chip could be powering AMD's next-generation Instinct MI300 GPU accelerator and supports all the data-centric features such as MFMA (Matrix-Fused-Multiply-Add), full-rate FP64, and packed FP32 operations. Other features also include XNACK which is specific to CPU+GPU memory space integration, as Coelacanth-Dream puts it.
The source states that although the GPU ISA is similar, the GFX940 does have a few differences when compared to Aldebaran 'CDNA 2' GPUs which are listed below:
Previous rumors have indicated that the AMD Instinct MI300 will feature a 4-GCD design based on the brand new CDNA 3 architecture. The upcoming Instinct MI200 was going to feature 128 compute units per die but that has changed to 110 compute units since last week's rumor. A total of 220 Compute Units would net 14,080 cores and if we take the exact number and multiply it by 4 (the number of GCDs on Instinct MI300), we end up with 440 Compute Units or an insane 28,160 cores.
— Kepler (@Kepler_L2) March 1, 2022
MI300 will feature 4 GCDs 🧐
— Kepler (@Kepler_L2) September 7, 2021
A recent AMD ROCm Developer Tools update that was spotted by Komachi did confirm a maximum of 4 MCM GPUs but those are simply 'Aldebaran' SKUs. There are expected to be at least four CDNA 2 powered Instinct accelerators with their respective (unique IDs) listed below. Note that the number doesn't represent the number of dies on each device but rather the device itself:
Now that would be true if AMD makes no changes whatsoever when moving from CDNA 2 to CDNA 3 but that's not the case. CDNA 3 is expected to bring forward a revised new architecture that won't be another Vega derivative like Arcturus or Aldebaran which makes this rumor more believable.
The GPU architecture may also use a layout that might end up looking similar to the new WGP/SE arrangement on the new RDNA 3 chips or an entirely new design tailored towards the HPC segment. But one thing is for sure, those quad-MCM GPUs definitely are something that we can't wait to see in action!
AMD Radeon Instinct Accelerators 2020
|Accelerator Name||AMD Instinct MI300||AMD Instinct MI250X||AMD Instinct MI250||AMD Instinct MI210||AMD Instinct MI100||AMD Radeon Instinct MI60||AMD Radeon Instinct MI50||AMD Radeon Instinct MI25||AMD Radeon Instinct MI8||AMD Radeon Instinct MI6|
|CPU Architecture||Zen 4 (Exascale APU)||N/A||N/A||N/A||N/A||N/A||N/A||N/A||N/A||N/A|
|GPU Architecture||TBA (CDNA 3)||Aldebaran (CDNA 2)||Aldebaran (CDNA 2)||Aldebaran (CDNA 2)||Arcturus (CDNA 1)||Vega 20||Vega 20||Vega 10||Fiji XT||Polaris 10|
|GPU Process Node||5nm+6nm||6nm||6nm||6nm||7nm FinFET||7nm FinFET||7nm FinFET||14nm FinFET||28nm||14nm FinFET|
|GPU Chiplets||4 (MCM / 3D Stacked)|
1 (Per Die)
1 (Per Die)
1 (Per Die)
1 (Per Die)
|1 (Monolithic)||1 (Monolithic)||1 (Monolithic)||1 (Monolithic)||1 (Monolithic)||1 (Monolithic)|
|GPU Clock Speed||TBA||1700 MHz||1700 MHz||1700 MHz||1500 MHz||1800 MHz||1725 MHz||1500 MHz||1000 MHz||1237 MHz|
|FP16 Compute||TBA||383 TOPs||362 TOPs||181 TOPs||185 TFLOPs||29.5 TFLOPs||26.5 TFLOPs||24.6 TFLOPs||8.2 TFLOPs||5.7 TFLOPs|
|FP32 Compute||TBA||95.7 TFLOPs||90.5 TFLOPs||45.3 TFLOPs||23.1 TFLOPs||14.7 TFLOPs||13.3 TFLOPs||12.3 TFLOPs||8.2 TFLOPs||5.7 TFLOPs|
|FP64 Compute||TBA||47.9 TFLOPs||45.3 TFLOPs||22.6 TFLOPs||11.5 TFLOPs||7.4 TFLOPs||6.6 TFLOPs||768 GFLOPs||512 GFLOPs||384 GFLOPs|
|VRAM||192 GB HBM3?||128 GB HBM2e||128 GB HBM2e||64 GB HBM2e||32 GB HBM2||32 GB HBM2||16 GB HBM2||16 GB HBM2||4 GB HBM1||16 GB GDDR5|
|Memory Clock||TBA||3.2 Gbps||3.2 Gbps||3.2 Gbps||1200 MHz||1000 MHz||1000 MHz||945 MHz||500 MHz||1750 MHz|
|Memory Bus||8192-bit||8192-bit||8192-bit||4096-bit||4096-bit bus||4096-bit bus||4096-bit bus||2048-bit bus||4096-bit bus||256-bit bus|
|Memory Bandwidth||TBA||3.2 TB/s||3.2 TB/s||1.6 TB/s||1.23 TB/s||1 TB/s||1 TB/s||484 GB/s||512 GB/s||224 GB/s|
|Form Factor||OAM||OAM||OAM||Dual Slot Card||Dual Slot, Full Length||Dual Slot, Full Length||Dual Slot, Full Length||Dual Slot, Full Length||Dual Slot, Half Length||Single Slot, Full Length|
|Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling|