⋮  

AMD’s Next-Gen Data Center Behemoth, The Instinct MI300 MCM ‘GFX940’ GPU, Makes Possible First Appearance In Linux Patch

Submit

It looks like AMD's next-gen Instinct MI300 GPU accelerator has made a possible first appearance in the latest Linux patch.

AMD Instinct MI300 'GFX940' GPU, Next-Gen Data Center MCM Accelerator, Makes Possible First Appearance In Linux Patch

The latest Linux Patch has included a new target for an unreleased AMD 'GFX940' GP which has a similar ISA as the Aldebaran 'GFX90a' GPU. It is speculated that this chip could be powering AMD's next-generation Instinct MI300 GPU accelerator and supports all the data-centric features such as MFMA (Matrix-Fused-Multiply-Add), full-rate FP64, and packed FP32 operations. Other features also include XNACK which is specific to CPU+GPU memory space integration, as Coelacanth-Dream puts it.

AMD Ryzen 7000 ‘Raphael’ CPU Specs & Price Rumors: Ryzen 9 7950X 24/16 Core Up To 5.4 GHz, Ryzen 9 7900X 12 Core Up To 5.3 GHz, Ryzen 7 7800X 8 Core Up To 5.2 GHz, Ryzen 5 7600X 6 Core Up To 5.1 GHz

The source states that although the GPU ISA is similar, the GFX940 does have a few differences when compared to Aldebaran 'CDNA 2' GPUs which are listed below:

AMD GFX90a and GFX940 GPUs for next-gen Instinct accelerators feature comparison. (Image Credits: Coelacanth-Dream)

Previous rumors have indicated that the AMD Instinct MI300 will feature a 4-GCD design based on the brand new CDNA 3 architecture. The upcoming Instinct MI200 was going to feature 128 compute units per die but that has changed to 110 compute units since last week's rumor. A total of 220 Compute Units would net 14,080 cores and if we take the exact number and multiply it by 4 (the number of GCDs on Instinct MI300), we end up with 440 Compute Units or an insane 28,160 cores.

A recent AMD ROCm Developer Tools update that was spotted by Komachi did confirm a maximum of 4 MCM GPUs but those are simply 'Aldebaran' SKUs. There are expected to be at least four CDNA 2 powered Instinct accelerators with their respective (unique IDs) listed below. Note that the number doesn't represent the number of dies on each device but rather the device itself:

AMD announces new ‘Raise The Game’ bundle amidst crypto market crash & falling GPU prices

  • 0x7408
  • 0x740C
  • 0x740F
  • 0x7410

Now that would be true if AMD makes no changes whatsoever when moving from CDNA 2 to CDNA 3 but that's not the case. CDNA 3 is expected to bring forward a revised new architecture that won't be another Vega derivative like Arcturus or Aldebaran which makes this rumor more believable.

The GPU architecture may also use a layout that might end up looking similar to the new WGP/SE arrangement on the new RDNA 3 chips or an entirely new design tailored towards the HPC segment. But one thing is for sure, those quad-MCM GPUs definitely are something that we can't wait to see in action!

AMD Radeon Instinct Accelerators 2020

Accelerator NameAMD Instinct MI300AMD Instinct MI250XAMD Instinct MI250AMD Instinct MI210AMD Instinct MI100AMD Radeon Instinct MI60AMD Radeon Instinct MI50AMD Radeon Instinct MI25AMD Radeon Instinct MI8AMD Radeon Instinct MI6
CPU ArchitectureZen 4 (Exascale APU)N/AN/AN/AN/AN/AN/AN/AN/AN/A
GPU ArchitectureTBA (CDNA 3)Aldebaran (CDNA 2)Aldebaran (CDNA 2)Aldebaran (CDNA 2)Arcturus (CDNA 1)Vega 20Vega 20Vega 10Fiji XTPolaris 10
GPU Process Node5nm+6nm6nm6nm6nm7nm FinFET7nm FinFET7nm FinFET14nm FinFET28nm14nm FinFET
GPU Chiplets4 (MCM / 3D Stacked)
1 (Per Die)
2 (MCM)
1 (Per Die)
2 (MCM)
1 (Per Die)
2 (MCM)
1 (Per Die)
1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)
GPU Cores28,160?14,08013,3126656768040963840409640962304
GPU Clock SpeedTBA1700 MHz1700 MHz1700 MHz1500 MHz1800 MHz1725 MHz1500 MHz1000 MHz1237 MHz
FP16 ComputeTBA383 TOPs362 TOPs181 TOPs185 TFLOPs29.5 TFLOPs26.5 TFLOPs24.6 TFLOPs8.2 TFLOPs5.7 TFLOPs
FP32 ComputeTBA95.7 TFLOPs90.5 TFLOPs45.3 TFLOPs23.1 TFLOPs14.7 TFLOPs13.3 TFLOPs12.3 TFLOPs8.2 TFLOPs5.7 TFLOPs
FP64 ComputeTBA47.9 TFLOPs45.3 TFLOPs22.6 TFLOPs11.5 TFLOPs7.4 TFLOPs6.6 TFLOPs768 GFLOPs512 GFLOPs384 GFLOPs
VRAM192 GB HBM3?128 GB HBM2e128 GB HBM2e64 GB HBM2e32 GB HBM232 GB HBM216 GB HBM216 GB HBM24 GB HBM116 GB GDDR5
Memory ClockTBA3.2 Gbps3.2 Gbps3.2 Gbps1200 MHz1000 MHz1000 MHz945 MHz500 MHz1750 MHz
Memory Bus8192-bit8192-bit8192-bit4096-bit4096-bit bus4096-bit bus4096-bit bus2048-bit bus4096-bit bus256-bit bus
Memory BandwidthTBA3.2 TB/s3.2 TB/s1.6 TB/s1.23 TB/s1 TB/s1 TB/s484 GB/s512 GB/s224 GB/s
Form FactorOAMOAMOAMDual Slot CardDual Slot, Full LengthDual Slot, Full LengthDual Slot, Full LengthDual Slot, Full LengthDual Slot, Half LengthSingle Slot, Full Length
CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive Cooling
TDP~600W560W500W300W300W300W300W300W175W150W
Submit