AMD Unveils Instinct MI200 ‘Aldebaran’ GPU, First 6nm MCM Product With 58 Billion Transistors, Over 14,000 Cores & 128 GB HBM2e Memory
AMD has officially announced its next-generation MI200 HPC GPU codenamed Aldebaran that uses a 6nm CDNA 2 architecture to deliver insane compute performance.
AMD Unveils Instinct MI200, Powering The Next-Gen Compute Powerhouse With First 6nm MCM GPU Technology & Over 95 TFLOPs FP32 Performance
Inside the AMD Instinct MI200 is an Aldebaran GPU featuring two dies, a secondary and a primary. It has two dies with each consisting of 8 shader engines for a total of 16 SE's. Each Shader Engine packs 16 CUs with full-rate FP64, packed FP32 & a 2nd Generation Matrix Engine for FP16 & BF16 operations.
Each die, as such, is composed of 128 compute units or 8192 stream processors. This rounds up to a total of 220 compute units or 14,080 stream processors for the entire chip. The Aldebaran GPU is also powered by a new XGMI interconnect. Each chiplet features a VCN 2.6 engine and the main IO controller.
As for DRAM, AMD has gone with an 8-channel interface consisting of 1024-bit interfaces for an 8192-bit wide bus interface. Each interface can support 2GB HBM2e DRAM modules. This should give us up to 16 GB of HBM2e memory capacity per stack and since there are eight stacks in total, the total amount of capacity would be a whopping 128 GB. That's 48 GB more than the A100 which houses 80 GB HBM2e memory. The full visualization of the Aldebaran GPU on the Instinct MI200 is available here.
AMD Radeon Instinct Accelerators 2020
|Accelerator Name
|AMD Instinct MI300
|AMD Instinct MI250X
|AMD Instinct MI250
|AMD Instinct MI100
|AMD Radeon Instinct MI60
|AMD Radeon Instinct MI50
|AMD Radeon Instinct MI25
|AMD Radeon Instinct MI8
|AMD Radeon Instinct MI6
|GPU Architecture
|TBA (CDNA 3)
|Aldebaran (CDNA 2)
|Aldebaran (CDNA 2)
|Arcturus (CDNA 1)
|Vega 20
|Vega 20
|Vega 10
|Fiji XT
|Polaris 10
|GPU Process Node
|Advanced Process Node
|Advanced Process Node
|Advanced Process Node
|7nm FinFET
|7nm FinFET
|7nm FinFET
|14nm FinFET
|28nm
|14nm FinFET
|GPU Dies
|4 (MCM)?
|2 (MCM)
|2 (MCM)
|1 (Monolithic)
|1 (Monolithic)
|1 (Monolithic)
|1 (Monolithic)
|1 (Monolithic)
|1 (Monolithic)
|GPU Cores
|28,160?
|14,080
|14,080?
|7680
|4096
|3840
|4096
|4096
|2304
|GPU Clock Speed
|TBA
|1700 MHz
|~1700 MHz
|~1500 MHz
|1800 MHz
|1725 MHz
|1500 MHz
|1000 MHz
|1237 MHz
|FP16 Compute
|TBA
|383 TOPs
|TBA
|185 TFLOPs
|29.5 TFLOPs
|26.5 TFLOPs
|24.6 TFLOPs
|8.2 TFLOPs
|5.7 TFLOPs
|FP32 Compute
|TBA
|95.8 TFLOPs
|TBA
|23.1 TFLOPs
|14.7 TFLOPs
|13.3 TFLOPs
|12.3 TFLOPs
|8.2 TFLOPs
|5.7 TFLOPs
|FP64 Compute
|TBA
|47.9 TFLOPs
|TBA
|11.5 TFLOPs
|7.4 TFLOPs
|6.6 TFLOPs
|768 GFLOPs
|512 GFLOPs
|384 GFLOPs
|VRAM
|TBA
|128 GB HBM2e
|128 GB HBM2e
|32 GB HBM2
|32 GB HBM2
|16 GB HBM2
|16 GB HBM2
|4 GB HBM1
|16 GB GDDR5
|Memory Clock
|TBA
|TBA
|TBA
|1200 MHz
|1000 MHz
|1000 MHz
|945 MHz
|500 MHz
|1750 MHz
|Memory Bus
|TBA
|8192-bit
|8192-bit
|4096-bit bus
|4096-bit bus
|4096-bit bus
|2048-bit bus
|4096-bit bus
|256-bit bus
|Memory Bandwidth
|TBA
|~2 TB/s?
|~2 TB/s?
|1.23 TB/s
|1 TB/s
|1 TB/s
|484 GB/s
|512 GB/s
|224 GB/s
|Form Factor
|TBA
|Dual Slot, Full Length / OAM
|Dual Slot, Full Length / OAM
|Dual Slot, Full Length
|Dual Slot, Full Length
|Dual Slot, Full Length
|Dual Slot, Full Length
|Dual Slot, Half Length
|Single Slot, Full Length
|Cooling
|TBA
|Passive Cooling
|Passive Cooling
|Passive Cooling
|Passive Cooling
|Passive Cooling
|Passive Cooling
|Passive Cooling
|Passive Cooling
|TDP
|TBA
|500W
|TBA
|300W
|300W
|300W
|300W
|175W
|150W