AMD Instinct MI300 GPU To Utilize Quad MCM ‘CDNA 3’ GPUs: Feature 3D Stacking With Up To 8 Compute Dies, HBM3, PCIe Gen 5.0 & 600W TDP
AMD Instinct MI300 GPUs which will be powered by the next-generation CDNA 3 architecture have been detailed by Moore's Law is Dead. The new GPUs will be powering the upcoming data centers and are rumored to be the first to incorporate a 3D-Stacking design.
AMD Instinct MI300 GPUs Rumored To Go All-Onboard With 3D-Stacking Design: Up To Four GPU Chiplets With 8 Compute Dies, HBM3 & PCIe Gen 5.0 at 600W
Last year, @Kepler_L2 revealed that the AMD Instinct MI300 was going to feature four Graphics Compute Dies. Later this was confirmed in a patch where the chip appeared as the 'GFX940' part. This was essentially going to double the MI250X which features two GCDs but the difference is that each GCD will feature two Compute dies. So for the Instinct MI300, we are going to get up to 8 GCDs on the top variant. In fact, the Instinct MI300 family will not be a singular GPU but will comprise several different configurations.
The top AMD Instinct MI300 GPU will feature a massive interposer that measures around ~2750 mm2. The interposer has a very interesting configuration that packs four 6nm tiles that contain the I/O controllers, IP Blocks and measure around ~320-360mm2. These tiles are based on a 6nm node and may also include some form of cache though that's not confirmed yet. Now on top of these IO stacks, AMD will be using the brand new 3D-Stacking technology to incorporate two Compute Dies.
These brand new AMD CDNA 3 architecture-based Compute Dies will be fabricated on a 5nm node and feature a die size of around 110mm2 per tile. Currently, there's no word about how many core or accelerator blocks each Compute die will hold but if we keep the same SP/core count as MI250X, we get up to 28,160 cores but once again, this is just mere speculation since a lot can change within CDNA 3. Since the memory controllers are onboard the bottom I/O die, they are connected to two stacks of HBM3 using more than 12 metal layers. Each die is interconnected using a total of 20,000 connections which is double what Apple is using on the M1 Ultra as a part of the UltraFusion chip design.
HBM Memory Specifications Comparison
|I/O (Bus Interface)||1024||1024||1024||1024|
|Maximum Bandwidth||128 GB/s||256 GB/s||460.8 GB/s||819.2 GB/s|
|DRAM ICs Per Stack||4||8||8||12|
|Maximum Capacity||4 GB||8 GB||16 GB||24 GB|
|tCCD||2ns (=1tCK)||2ns (=1tCK)||2ns (=1tCK)||TBA|
|VPP||External VPP||External VPP||External VPP||External VPP
|Command Input||Dual Command||Dual Command||Dual Command||Dual Command|
Now while AMD is still relying on 8-stacks, they are the newer HBM3 standard which is the same as the one NVIDIA is using for its Hopper GPUs. Currently, MI250X uses 8 HBM2e stacks which are 8-hi and feature 16 GB of memory per stack (128 GB per module). It may be likely that AMD raises the stacks to 12-Hi which is something that SK Hynix has already teased a while back. This would allow for up to 192 GB memory capacities on the top Instinct MI300 GPU configuration, marking a 50% increase. As for the TDP, each CDNA 3 tile (1x 6nm + 2x 5nm dies) will have a TDP of around 150W. As for the configurations, they are as follows:
- Top Config: 4x IO Die (6nm) + 4x GCDs (5nm) + 8x Compute Dies (5nm)
- Mid Config: 2x IO Die (6nm) + 2x GCDs (5nm) + 4x Compute Dies (5nm)
- Low Config: 1x IO Die (6nm) + 1x GCDs (5nm) + 2x Compute Dies (5nm)
AMD Instinct MI300 GPU Configurations (Image Credits: Moore's Law is Dead):
So based on that, the top configuration will consume around 600W of power, the mid-config will consume around 300W of power while the entry-level config will consume around 150W power. Currently, the top Instinct MI250X configuration consumes 560W of power and comes in the OAM form factor. The Instinct MI300 GPUs will be launching next year around the same time when Intel and NVIDIA will be out with their latest data center products such as Ponte Vecchio and Hopper.
AMD Radeon Instinct Accelerators 2020
|Accelerator Name||AMD Instinct MI300||AMD Instinct MI250X||AMD Instinct MI250||AMD Instinct MI210||AMD Instinct MI100||AMD Radeon Instinct MI60||AMD Radeon Instinct MI50||AMD Radeon Instinct MI25||AMD Radeon Instinct MI8||AMD Radeon Instinct MI6|
|CPU Architecture||Zen 4 (Exascale APU)||N/A||N/A||N/A||N/A||N/A||N/A||N/A||N/A||N/A|
|GPU Architecture||TBA (CDNA 3)||Aldebaran (CDNA 2)||Aldebaran (CDNA 2)||Aldebaran (CDNA 2)||Arcturus (CDNA 1)||Vega 20||Vega 20||Vega 10||Fiji XT||Polaris 10|
|GPU Process Node||5nm+6nm||6nm||6nm||6nm||7nm FinFET||7nm FinFET||7nm FinFET||14nm FinFET||28nm||14nm FinFET|
|GPU Chiplets||4 (MCM / 3D Stacked)|
1 (Per Die)
1 (Per Die)
1 (Per Die)
1 (Per Die)
|1 (Monolithic)||1 (Monolithic)||1 (Monolithic)||1 (Monolithic)||1 (Monolithic)||1 (Monolithic)|
|GPU Clock Speed||TBA||1700 MHz||1700 MHz||1700 MHz||1500 MHz||1800 MHz||1725 MHz||1500 MHz||1000 MHz||1237 MHz|
|FP16 Compute||TBA||383 TOPs||362 TOPs||181 TOPs||185 TFLOPs||29.5 TFLOPs||26.5 TFLOPs||24.6 TFLOPs||8.2 TFLOPs||5.7 TFLOPs|
|FP32 Compute||TBA||95.7 TFLOPs||90.5 TFLOPs||45.3 TFLOPs||23.1 TFLOPs||14.7 TFLOPs||13.3 TFLOPs||12.3 TFLOPs||8.2 TFLOPs||5.7 TFLOPs|
|FP64 Compute||TBA||47.9 TFLOPs||45.3 TFLOPs||22.6 TFLOPs||11.5 TFLOPs||7.4 TFLOPs||6.6 TFLOPs||768 GFLOPs||512 GFLOPs||384 GFLOPs|
|VRAM||192 GB HBM3?||128 GB HBM2e||128 GB HBM2e||64 GB HBM2e||32 GB HBM2||32 GB HBM2||16 GB HBM2||16 GB HBM2||4 GB HBM1||16 GB GDDR5|
|Memory Clock||TBA||3.2 Gbps||3.2 Gbps||3.2 Gbps||1200 MHz||1000 MHz||1000 MHz||945 MHz||500 MHz||1750 MHz|
|Memory Bus||8192-bit||8192-bit||8192-bit||4096-bit||4096-bit bus||4096-bit bus||4096-bit bus||2048-bit bus||4096-bit bus||256-bit bus|
|Memory Bandwidth||TBA||3.2 TB/s||3.2 TB/s||1.6 TB/s||1.23 TB/s||1 TB/s||1 TB/s||484 GB/s||512 GB/s||224 GB/s|
|Form Factor||OAM||OAM||OAM||Dual Slot Card||Dual Slot, Full Length||Dual Slot, Full Length||Dual Slot, Full Length||Dual Slot, Full Length||Dual Slot, Half Length||Single Slot, Full Length|
|Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling||Passive Cooling|