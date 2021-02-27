It looks like AMD is accelerating the production of its next-generation Instinct accelerator, the MI200, which is expected to feature an MCM GPU design. According to the latest information dump, not only is the codename for the GPU unveiled but also a range of new specifications.

The AMD Instinct family, starting in 2020, is all CDNA architecture-based. The first generation CDNA flagship, the Instinct MI100, was internally codenamed Arcturus. It was a follow-up to Vega and the GPUs are named after giant stars. The successor to the Instinct MI100, the MI200, is also seemingly going to be named after a huge star and this time, it is expected to be known as Aldebaran.

In the latest Linux patch support (via Phoronix), the AMD Instinct MI200 could be known as Aldebaran which is a giant star located within the constellation of Taurus and has a solar radius of 44.13 or 75% more than Arcturus. The naming convention seems to suggest that Aldebaran will be twice as powerful as Arcturus since the numbers in the MI accelerator's naming convention represent the theoretical Flops performance. This is just speculation at this point but given that the accelerator is expected to feature an MCM GPU design, it might be real.

The patches also reveal that the AMD Instinct MI200 'Aldebaran' GPU will feature HBM2E memory support. The brand new memory standard was first used by NVIDIA's Ampere GA100 GPUs & will offer a nice boost over the standard HBM2 configuration used on the Arcturus-based MI100 GPU accelerator. HBM2E allows up to 16 GB memory capacity per stack so we can expect up to 64 GB HBM2E memory at blisteringly fast speeds for Aldebaran.

Other features listed include SDMA (System Direct Memory Access) support which will allow data transfers over PCIe and XGMI/Infinity Cache subsystems. It looks like AMD will incorporate its new Infinity Cache design on upcoming Instinct accelerators too so we are looking for a very advanced version of the Vega GPU.

ARCTURUS ALDEBARAN .asic_family = CHIP_ARCTURUS,

.asic_name = “arcturus”,

.max_pasid_bits = 16,

.max_no_of_hqd = 24,

.doorbell_size = 8,

.ih_ring_entry_size = 8 * sizeof(uint32_t),

.event_interrupt_class = &event_interrupt_class_v9,

.num_of_watch_points = 4,

.mqd_size_aligned = MQD_SIZE_ALIGNED,

.supports_cwsr = true,

.needs_iommu_device = false,

.needs_pci_atomics = false,

.num_sdma_engines = 2,

.num_xgmi_sdma_engines = 6,

.num_sdma_queues_per_engine = 8, .asic_family = CHIP_ALDEBARAN,

.asic_name = “aldebaran”,

.max_pasid_bits = 16,

.max_no_of_hqd = 24,

.doorbell_size = 8,

.ih_ring_entry_size = 8 * sizeof(uint32_t),

.event_interrupt_class = &event_interrupt_class_v9,

.num_of_watch_points = 4,

.mqd_size_aligned = MQD_SIZE_ALIGNED,

.supports_cwsr = true,

.needs_iommu_device = false,

.needs_pci_atomics = false,

.num_sdma_engines = 2,

.num_xgmi_sdma_engines = 3,

.num_sdma_queues_per_engine = 8,

There's also a hint at the MCM GPU design again for the AMD Instinct MI200 'Aldebaran GPU'. The patch states a new mode known as Performance Determinism in which the PMFW will maintain sustained performance level and can be enabled on a per-die basis. This would allow each GPU die to run this feature but a max graphics frequency needs to be specified so they don't exceed the power caps.







Do note that AMD's CDNA 2 GPU will be fabricated on a brand new process node & are confirmed to feature a 3rd Generation AMD Infinity architecture that extends to Exascale by allowing up to 8-Way coherent GPU connectivity.

AMD Radeon Instinct Accelerators 2020

Accelerator Name AMD Radeon Instinct MI6 AMD Radeon Instinct MI8 AMD Radeon Instinct MI25 AMD Radeon Instinct MI50 AMD Radeon Instinct MI60 AMD Instinct MI100 AMD Instinct MI100 GPU Architecture Polaris 10 Fiji XT Vega 10 Vega 20 Vega 20 Arcturus TBA GPU Process Node 14nm FinFET 28nm 14nm FinFET 7nm FinFET 7nm FinFET 7nm FinFET Advanced Process Node GPU Cores 2304 4096 4096 3840 4096 7680 7680 x 2 (MCM) ? GPU Clock Speed 1237 MHz 1000 MHz 1500 MHz 1725 MHz 1800 MHz ~1500 MHz TBA FP16 Compute 5.7 TFLOPs 8.2 TFLOPs 24.6 TFLOPs 26.5 TFLOPs 29.5 TFLOPs 185 TFLOPs TBA FP32 Compute 5.7 TFLOPs 8.2 TFLOPs 12.3 TFLOPs 13.3 TFLOPs 14.7 TFLOPs 23.1 TFLOPs TBA FP64 Compute 384 GFLOPs 512 GFLOPs 768 GFLOPs 6.6 TFLOPs 7.4 TFLOPs 11.5 TFLOPs TBA VRAM 16 GB GDDR5 4 GB HBM1 16 GB HBM2 16 GB HBM2 32 GB HBM2 32 GB HBM2 TBA Memory Clock 1750 MHz 500 MHz 945 MHz 1000 MHz 1000 MHz 1200 MHz TBA Memory Bus 256-bit bus 4096-bit bus 2048-bit bus 4096-bit bus 4096-bit bus 4096-bit bus TBA Memory Bandwidth 224 GB/s 512 GB/s 484 GB/s 1 TB/s 1 TB/s 1.23 TB/s TBA Form Factor Single Slot, Full Length Dual Slot, Half Length Dual Slot, Full Length Dual Slot, Full Length Dual Slot, Full Length Dual Slot, Full Length OAM Cooling Passive Cooling Passive Cooling Passive Cooling Passive Cooling Passive Cooling Passive Cooling Passive Cooling TDP 150W 175W 300W 300W 300W 300W TBA

