AMD Instinct MI300 Could Feature Quad MCM GPUs Based on CDNA 3 Architecture

Hassan Mujtaba

AMD is about to launch its Instinct MI200 GPU accelerator which will be the first chip to feature an MCM graphics architecture but it looks like the next-generation Instinct MI300, featuring the CDNA 3 architecture, is going to blow it away with a quad MCM design.

AMD Instinct MI300 Rumored To Feature Four MCM GPUs Based on CDNA 3 Graphics Architecture

The AMD Instinct MI200 accelerator with its CDNA 2 architecture is expected to feature two GPU dies on the same package. These will be connected by an Infinity Fabric interconnect and will also feature a separate die that will serve as a multi-tier cache between the two GPUs. Each graphics die will be known as GCD while the cache-die will be known as MCD.

Related Story AMD Says EPYC Turin Already Crushes NVIDIA Vera by 2.37x in Agentic AI, With Zen 6 Venice Pushing the Lead Past 3.3x

There will be two CDNA 2 GPUs onboard the Instinct MI200 package but the next-generation HPC accelerator is rumored to double that. According to Kepler_L2, the Instinct MI300 will feature a 4-GCD design based on the brand new CDNA 3 architecture. The upcoming Instinct MI200 was going to feature 128 compute units per die but that has changed to 110 compute units since last week's rumor. A total of 220 Compute Units would net 14,080 cores and if we take the exact number and multiply it by 4 (the number of GCDs on Instinct MI300), we end up with 440 Compute Units or an insane 28,160 cores.

A recent AMD ROCm Developer Tools update that was spotted by Komachi did confirm a maximum of 4 MCM GPUs but those are simply 'Aldebaran' SKUs. There are expected to be at least four CDNA 2 powered Instinct accelerators with their respective (unique IDs) listed below. Note that the number doesn't represent the number of dies on each device but rather the device itself:

  • 0x7408
  • 0x740C
  • 0x740F
  • 0x7410

Now that would be true if AMD makes no changes whatsoever when moving from CDNA 2 to CDNA 3 but that's not the case. CDNA 3 is expected to bring forward a revised new architecture that won't be another Vega derivative like Arcturus or Aldebaran. The GPU architecture may also use a layout that might end up looking similar to the new WGP/SE arrangement on the new RDNA 3 chips or an entirely new design tailored towards the HPC segment. But one thing is for sure, those quad-MCM GPUs definitely are something that we can't wait to see in action!

AMD Radeon Instinct Accelerators

Accelerator NameAMD Instinct MI400AMD Instinct MI350XAMD Instinct MI300XAMD Instinct MI300AAMD Instinct MI250XAMD Instinct MI250AMD Instinct MI210AMD Instinct MI100AMD Radeon Instinct MI60AMD Radeon Instinct MI50AMD Radeon Instinct MI25AMD Radeon Instinct MI8AMD Radeon Instinct MI6
CPU ArchitectureZen 5 (Exascale APU)N/AN/AZen 4 (Exascale APU)N/AN/AN/AN/AN/AN/AN/AN/AN/A
GPU ArchitectureCDNA 4CDNA 3+?Aqua Vanjaram (CDNA 3)Aqua Vanjaram (CDNA 3)Aldebaran (CDNA 2)Aldebaran (CDNA 2)Aldebaran (CDNA 2)Arcturus (CDNA 1)Vega 20Vega 20Vega 10Fiji XTPolaris 10
GPU Process Node4nm4nm5nm+6nm5nm+6nm6nm6nm6nm7nm FinFET7nm FinFET7nm FinFET14nm FinFET28nm14nm FinFET
GPU ChipletsTBDTBD8 (MCM)8 (MCM)2 (MCM)
1 (Per Die)
2 (MCM)
1 (Per Die)
2 (MCM)
1 (Per Die)
1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)
GPU CoresTBDTBD19,45614,59214,08013,3126656768040963840409640962304
GPU Clock SpeedTBDTBD2100 MHz2100 MHz1700 MHz1700 MHz1700 MHz1500 MHz1800 MHz1725 MHz1500 MHz1000 MHz1237 MHz
INT8 ComputeTBDTBD2614 TOPS1961 TOPS383 TOPs362 TOPS181 TOPS92.3 TOPSN/AN/AN/AN/AN/A
FP16 ComputeTBDTBD1.3 PFLOPs980.6 TFLOPs383 TFLOPs362 TFLOPs181 TFLOPs185 TFLOPs29.5 TFLOPs26.5 TFLOPs24.6 TFLOPs8.2 TFLOPs5.7 TFLOPs
FP32 ComputeTBDTBD163.4 TFLOPs122.6 TFLOPs95.7 TFLOPs90.5 TFLOPs45.3 TFLOPs23.1 TFLOPs14.7 TFLOPs13.3 TFLOPs12.3 TFLOPs8.2 TFLOPs5.7 TFLOPs
FP64 ComputeTBDTBD81.7 TFLOPs61.3 TFLOPs47.9 TFLOPs45.3 TFLOPs22.6 TFLOPs11.5 TFLOPs7.4 TFLOPs6.6 TFLOPs768 GFLOPs512 GFLOPs384 GFLOPs
VRAMTBDHBM3e192 GB HBM3128 GB HBM3128 GB HBM2e128 GB HBM2e64 GB HBM2e32 GB HBM232 GB HBM216 GB HBM216 GB HBM24 GB HBM116 GB GDDR5
Infinity CacheTBDTBD256 MB256 MBN/AN/AN/AN/AN/AN/AN/AN/AN/A
Memory ClockTBDTBD5.2 Gbps5.2 Gbps3.2 Gbps3.2 Gbps3.2 Gbps1200 MHz1000 MHz1000 MHz945 MHz500 MHz1750 MHz
Memory BusTBDTBD8192-bit8192-bit8192-bit8192-bit4096-bit4096-bit bus4096-bit bus4096-bit bus2048-bit bus4096-bit bus256-bit bus
Memory BandwidthTBDTBD5.3 TB/s5.3 TB/s3.2 TB/s3.2 TB/s1.6 TB/s1.23 TB/s1 TB/s1 TB/s484 GB/s512 GB/s224 GB/s
Form FactorTBDTBDOAMAPU SH5 SocketOAMOAMDual Slot CardDual Slot, Full LengthDual Slot, Full LengthDual Slot, Full LengthDual Slot, Full LengthDual Slot, Half LengthSingle Slot, Full Length
CoolingTBDTBDPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive Cooling
TDP (Max)TBDTBD750W760W560W500W300W300W300W300W300W175W150W
Hassan Mujtaba Photo

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Button