AMD Instinct MI300 GPU To Utilize Quad MCM ‘CDNA 3’ GPUs: Feature 3D Stacking With Up To 8 Compute Dies, HBM3, PCIe Gen 5.0 & 600W TDP

AMD Instinct MI300 GPU To Utilize Quad MCM 'CDNA 3' GPUs: Feature 3D Stacking With Up To 8 Compute Dies, HBM3, PCIe Gen 5.0 & 600W TDP

AMD Instinct MI300 GPUs which will be powered by the next-generation CDNA 3 architecture have been detailed by Moore's Law is Dead. The new GPUs will be powering the upcoming data centers and are rumored to be the first to incorporate a 3D-Stacking design.

AMD Instinct MI300 GPUs Rumored To Go All-Onboard With 3D-Stacking Design: Up To Four GPU Chiplets With 8 Compute Dies, HBM3 & PCIe Gen 5.0 at 600W

Last year, @Kepler_L2 revealed that the AMD Instinct MI300 was going to feature four Graphics Compute Dies. Later this was confirmed in a patch where the chip appeared as the 'GFX940' part. This was essentially going to double the MI250X which features two GCDs but the difference is that each GCD will feature two Compute dies. So for the Instinct MI300, we are going to get up to 8 GCDs on the top variant. In fact, the Instinct MI300 family will not be a singular GPU but will comprise several different configurations.

Related StoryRamish Zafar
AMD Hit With Weak PC Demand As Desktop, Notebook Sales Tank By 40%
AMD Instinct MI300 'CDNA 3' GPU details have been revealed by Moore's Law is Dead.

The top AMD Instinct MI300 GPU will feature a massive interposer that measures around ~2750 mm2. The interposer has a very interesting configuration that packs four 6nm tiles that contain the I/O controllers, IP Blocks and measure around ~320-360mm2. These tiles are based on a 6nm node and may also include some form of cache though that's not confirmed yet. Now on top of these IO stacks, AMD will be using the brand new 3D-Stacking technology to incorporate two Compute Dies.

These brand new AMD CDNA 3 architecture-based Compute Dies will be fabricated on a 5nm node and feature a die size of around 110mm2 per tile. Currently, there's no word about how many core or accelerator blocks each Compute die will hold but if we keep the same SP/core count as MI250X, we get up to 28,160 cores but once again, this is just mere speculation since a lot can change within CDNA 3. Since the memory controllers are onboard the bottom I/O die, they are connected to two stacks of HBM3 using more than 12 metal layers. Each die is interconnected using a total of 20,000 connections which is double what Apple is using on the M1 Ultra as a part of the UltraFusion chip design.

2022-04-28_9-16-17-low_res-scale-2_00x-custom
2022-04-28_9-14-25-low_res-scale-2_00x-custom
2022-04-28_9-14-39-low_res-scale-2_00x-custom
2022-04-28_9-14-43-low_res-scale-2_00x-custom
2022-04-28_9-14-49-low_res-scale-2_00x-custom
2022-04-28_9-15-39-low_res-scale-2_00x-custom

HBM Memory Specifications Comparison

DRAMHBM1HBM2HBM2eHBM3
I/O (Bus Interface)1024102410241024
Prefetch (I/O)2222
Maximum Bandwidth128 GB/s256 GB/s460.8 GB/s819.2 GB/s
DRAM ICs Per Stack48812
Maximum Capacity4 GB8 GB16 GB24 GB
tRC48ns45ns45nsTBA
tCCD2ns (=1tCK)2ns (=1tCK)2ns (=1tCK)TBA
VPPExternal VPPExternal VPPExternal VPP
External VPP
VDD1.2V1.2V1.2VTBA
Command InputDual CommandDual CommandDual CommandDual Command

Now while AMD is still relying on 8-stacks, they are the newer HBM3 standard which is the same as the one NVIDIA is using for its Hopper GPUs. Currently, MI250X uses 8 HBM2e stacks which are 8-hi and feature 16 GB of memory per stack (128 GB per module). It may be likely that AMD raises the stacks to 12-Hi which is something that SK Hynix has already teased a while back. This would allow for up to 192 GB memory capacities on the top Instinct MI300 GPU configuration, marking a 50% increase. As for the TDP, each CDNA 3 tile (1x 6nm + 2x 5nm dies) will have a TDP of around 150W. As for the configurations, they are as follows:

  • Top Config: 4x IO Die (6nm) + 4x GCDs (5nm) + 8x Compute Dies (5nm)
  • Mid Config: 2x IO Die (6nm) + 2x GCDs (5nm) + 4x Compute Dies (5nm)
  • Low Config: 1x IO Die (6nm) + 1x GCDs (5nm) + 2x Compute Dies (5nm)

AMD Instinct MI300 GPU Configurations (Image Credits: Moore's Law is Dead):

So based on that, the top configuration will consume around 600W of power, the mid-config will consume around 300W of power while the entry-level config will consume around 150W power. Currently, the top Instinct MI250X configuration consumes 560W of power and comes in the OAM form factor. The Instinct MI300 GPUs will be launching next year around the same time when Intel and NVIDIA will be out with their latest data center products such as Ponte Vecchio and Hopper.

Related StoryRamish Zafar
AMD Chief Shares Her Secret Sauce Behind Successful Zen CPUs & Company Turnaround

AMD Radeon Instinct Accelerators 2020

Accelerator NameAMD Instinct MI300AMD Instinct MI250XAMD Instinct MI250AMD Instinct MI210AMD Instinct MI100AMD Radeon Instinct MI60AMD Radeon Instinct MI50AMD Radeon Instinct MI25AMD Radeon Instinct MI8AMD Radeon Instinct MI6
CPU ArchitectureZen 4 (Exascale APU)N/AN/AN/AN/AN/AN/AN/AN/AN/A
GPU ArchitectureTBA (CDNA 3)Aldebaran (CDNA 2)Aldebaran (CDNA 2)Aldebaran (CDNA 2)Arcturus (CDNA 1)Vega 20Vega 20Vega 10Fiji XTPolaris 10
GPU Process Node5nm+6nm6nm6nm6nm7nm FinFET7nm FinFET7nm FinFET14nm FinFET28nm14nm FinFET
GPU Chiplets4 (MCM / 3D Stacked)
1 (Per Die)
2 (MCM)
1 (Per Die)
2 (MCM)
1 (Per Die)
2 (MCM)
1 (Per Die)
1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)1 (Monolithic)
GPU Cores28,160?14,08013,3126656768040963840409640962304
GPU Clock SpeedTBA1700 MHz1700 MHz1700 MHz1500 MHz1800 MHz1725 MHz1500 MHz1000 MHz1237 MHz
FP16 ComputeTBA383 TOPs362 TOPs181 TOPs185 TFLOPs29.5 TFLOPs26.5 TFLOPs24.6 TFLOPs8.2 TFLOPs5.7 TFLOPs
FP32 ComputeTBA95.7 TFLOPs90.5 TFLOPs45.3 TFLOPs23.1 TFLOPs14.7 TFLOPs13.3 TFLOPs12.3 TFLOPs8.2 TFLOPs5.7 TFLOPs
FP64 ComputeTBA47.9 TFLOPs45.3 TFLOPs22.6 TFLOPs11.5 TFLOPs7.4 TFLOPs6.6 TFLOPs768 GFLOPs512 GFLOPs384 GFLOPs
VRAM192 GB HBM3?128 GB HBM2e128 GB HBM2e64 GB HBM2e32 GB HBM232 GB HBM216 GB HBM216 GB HBM24 GB HBM116 GB GDDR5
Memory ClockTBA3.2 Gbps3.2 Gbps3.2 Gbps1200 MHz1000 MHz1000 MHz945 MHz500 MHz1750 MHz
Memory Bus8192-bit8192-bit8192-bit4096-bit4096-bit bus4096-bit bus4096-bit bus2048-bit bus4096-bit bus256-bit bus
Memory BandwidthTBA3.2 TB/s3.2 TB/s1.6 TB/s1.23 TB/s1 TB/s1 TB/s484 GB/s512 GB/s224 GB/s
Form FactorOAMOAMOAMDual Slot CardDual Slot, Full LengthDual Slot, Full LengthDual Slot, Full LengthDual Slot, Full LengthDual Slot, Half LengthSingle Slot, Full Length
CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive CoolingPassive Cooling
TDP~600W560W500W300W300W300W300W300W175W150W

WccfTech Tv
Subscribe
Filter videos by
Order