AMD Instinct MI300 GPU To Utilize Quad MCM ‘CDNA 3’ GPUs: Feature 3D Stacking With Up To 8 Compute Dies, HBM3, PCIe Gen 5.0 & 600W TDP

Hassan Mujtaba
AMD Instinct MI300 GPU To Utilize Quad MCM 'CDNA 3' GPUs: Feature 3D Stacking With Up To 8 Compute Dies, HBM3, PCIe Gen 5.0 & 600W TDP

AMD Instinct MI300 GPUs which will be powered by the next-generation CDNA 3 architecture have been detailed by Moore's Law is Dead. The new GPUs will be powering the upcoming data centers and are rumored to be the first to incorporate a 3D-Stacking design.

AMD Instinct MI300 GPUs Rumored To Go All-Onboard With 3D-Stacking Design: Up To Four GPU Chiplets With 8 Compute Dies, HBM3 & PCIe Gen 5.0 at 600W

Last year, @Kepler_L2 revealed that the AMD Instinct MI300 was going to feature four Graphics Compute Dies. Later this was confirmed in a patch where the chip appeared as the 'GFX940' part. This was essentially going to double the MI250X which features two GCDs but the difference is that each GCD will feature two Compute dies. So for the Instinct MI300, we are going to get up to 8 GCDs on the top variant. In fact, the Instinct MI300 family will not be a singular GPU but will comprise several different configurations.

Related Story AMD’s Frank Azor Pushes Back on FSR 4.1 Cancellation Rumor for RDNA 3.5 iGPUs, Says No Such Decision Has Been Made
AMD Instinct MI300 'CDNA 3' GPU details have been revealed by Moore's Law is Dead.

The top AMD Instinct MI300 GPU will feature a massive interposer that measures around ~2750 mm2. The interposer has a very interesting configuration that packs four 6nm tiles that contain the I/O controllers, IP Blocks and measure around ~320-360mm2. These tiles are based on a 6nm node and may also include some form of cache though that's not confirmed yet. Now on top of these IO stacks, AMD will be using the brand new 3D-Stacking technology to incorporate two Compute Dies.

These brand new AMD CDNA 3 architecture-based Compute Dies will be fabricated on a 5nm node and feature a die size of around 110mm2 per tile. Currently, there's no word about how many core or accelerator blocks each Compute die will hold but if we keep the same SP/core count as MI250X, we get up to 28,160 cores but once again, this is just mere speculation since a lot can change within CDNA 3. Since the memory controllers are onboard the bottom I/O die, they are connected to two stacks of HBM3 using more than 12 metal layers. Each die is interconnected using a total of 20,000 connections which is double what Apple is using on the M1 Ultra as a part of the UltraFusion chip design.

2022-04-28_9-16-17-low_res-scale-2_00x-custom
2022-04-28_9-14-25-low_res-scale-2_00x-custom
2022-04-28_9-14-39-low_res-scale-2_00x-custom
2022-04-28_9-14-43-low_res-scale-2_00x-custom
2022-04-28_9-14-49-low_res-scale-2_00x-custom
2022-04-28_9-15-39-low_res-scale-2_00x-custom

HBM Memory Specifications Comparison

DRAMHBM1HBM2HBM2eHBM3HBM3EHBMNext (HBM4)
I/O (Bus Interface)10241024102410241024-20481024-2048
Prefetch (I/O)222222
Maximum Bandwidth128 GB/s256 GB/s460.8 GB/s819.2 GB/s1.2 TB/s1.5 - 2.56 TB/s
DRAM ICs Per Stack488128-168-16
Maximum Capacity4 GB8 GB16 GB24 GB24 - 36 GB36-64 GB
tRC48ns45ns45nsTBATBATBA
tCCD2ns (=1tCK)2ns (=1tCK)2ns (=1tCK)TBATBATBA
VPPExternal VPPExternal VPPExternal VPP
External VPP
External VPPTBA
VDD1.2V1.2V1.2VTBATBATBA
Command InputDual CommandDual CommandDual CommandDual CommandDual CommandDual Command

Now while AMD is still relying on 8-stacks, they are the newer HBM3 standard which is the same as the one NVIDIA is using for its Hopper GPUs. Currently, MI250X uses 8 HBM2e stacks which are 8-hi and feature 16 GB of memory per stack (128 GB per module). It may be likely that AMD raises the stacks to 12-Hi which is something that SK Hynix has already teased a while back. This would allow for up to 192 GB memory capacities on the top Instinct MI300 GPU configuration, marking a 50% increase. As for the TDP, each CDNA 3 tile (1x 6nm + 2x 5nm dies) will have a TDP of around 150W. As for the configurations, they are as follows:

  • Top Config: 4x IO Die (6nm) + 4x GCDs (5nm) + 8x Compute Dies (5nm)
  • Mid Config: 2x IO Die (6nm) + 2x GCDs (5nm) + 4x Compute Dies (5nm)
  • Low Config: 1x IO Die (6nm) + 1x GCDs (5nm) + 2x Compute Dies (5nm)

AMD Instinct MI300 GPU Configurations (Image Credits: Moore's Law is Dead):

So based on that, the top configuration will consume around 600W of power, the mid-config will consume around 300W of power while the entry-level config will consume around 150W power. Currently, the top Instinct MI250X configuration consumes 560W of power and comes in the OAM form factor. The Instinct MI300 GPUs will be launching next year around the same time when Intel and NVIDIA will be out with their latest data center products such as Ponte Vecchio and Hopper.

AMD Radeon Instinct Accelerators

Accelerator NameAMD Instinct MI400AMD Instinct MI350XAMD Instinct MI300XAMD Instinct MI300AAMD Instinct MI250XAMD Instinct MI250AMD Instinct MI210AMD Instinct MI100AMD Radeon Instinct MI60AMD Radeon Instinct MI50AMD Radeon Instinct MI25AMD Radeon Instinct MI8AMD Radeon Instinct MI6
CPU ArchitectureZen 5 (Exascale APU)N/AN/AZen 4 (Exascale APU)N/AN/AN/AN/AN/AN/AN/AN/AN/A
GPU ArchitectureCDNA 4CDNA 3+?Aqua Vanjaram (CDNA 3)Aqua Vanjaram (CDNA 3)Aldebaran (CDNA 2)Aldebaran (CDNA 2)Aldebaran (CDNA 2)Arcturus (CDNA 1)Vega 20Vega 20Vega 10Fiji XTPolaris 10
GPU Process Node4nm4nm5nm+6nm5nm+6nm6nm6nm6nm7nm FinFET7nm FinFET7nm FinFET14nm FinFET28nm14nm FinFET
GPU ChipletsTBDTBD8 (MCM)8 (MCM)2 (MCM)
1 (Per Die)
2 (MCM)
1 (Per Die)
2 (MCM)
1 (Per Die)
1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)
GPU CoresTBDTBD19,45614,59214,08013,3126656768040963840409640962304
GPU Clock SpeedTBDTBD2100 MHz2100 MHz1700 MHz1700 MHz1700 MHz1500 MHz1800 MHz1725 MHz1500 MHz1000 MHz1237 MHz
INT8 ComputeTBDTBD2614 TOPS1961 TOPS383 TOPs362 TOPS181 TOPS92.3 TOPSN/AN/AN/AN/AN/A
FP16 ComputeTBDTBD1.3 PFLOPs980.6 TFLOPs383 TFLOPs362 TFLOPs181 TFLOPs185 TFLOPs29.5 TFLOPs26.5 TFLOPs24.6 TFLOPs8.2 TFLOPs5.7 TFLOPs
FP32 ComputeTBDTBD163.4 TFLOPs122.6 TFLOPs95.7 TFLOPs90.5 TFLOPs45.3 TFLOPs23.1 TFLOPs14.7 TFLOPs13.3 TFLOPs12.3 TFLOPs8.2 TFLOPs5.7 TFLOPs
FP64 ComputeTBDTBD81.7 TFLOPs61.3 TFLOPs47.9 TFLOPs45.3 TFLOPs22.6 TFLOPs11.5 TFLOPs7.4 TFLOPs6.6 TFLOPs768 GFLOPs512 GFLOPs384 GFLOPs
VRAMTBDHBM3e192 GB HBM3128 GB HBM3128 GB HBM2e128 GB HBM2e64 GB HBM2e32 GB HBM232 GB HBM216 GB HBM216 GB HBM24 GB HBM116 GB GDDR5
Infinity CacheTBDTBD256 MB256 MBN/AN/AN/AN/AN/AN/AN/AN/AN/A
Memory ClockTBDTBD5.2 Gbps5.2 Gbps3.2 Gbps3.2 Gbps3.2 Gbps1200 MHz1000 MHz1000 MHz945 MHz500 MHz1750 MHz
Memory BusTBDTBD8192-bit8192-bit8192-bit8192-bit4096-bit4096-bit bus4096-bit bus4096-bit bus2048-bit bus4096-bit bus256-bit bus
Memory BandwidthTBDTBD5.3 TB/s5.3 TB/s3.2 TB/s3.2 TB/s1.6 TB/s1.23 TB/s1 TB/s1 TB/s484 GB/s512 GB/s224 GB/s
Form FactorTBDTBDOAMAPU SH5 SocketOAMOAMDual Slot CardDual Slot, Full LengthDual Slot, Full LengthDual Slot, Full LengthDual Slot, Full LengthDual Slot, Half LengthSingle Slot, Full Length
CoolingTBDTBDPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive Cooling
TDP (Max)TBDTBD750W760W560W500W300W300W300W300W300W175W150W

Hassan Mujtaba Photo

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Button