Hardware Leak Rumor

AMD Instinct MI200 Speculated to Utilize 110 Compute Units Per MCM GPU

Jason R. Wilson

• Sep 1, 2021 at 01:12pm EDT

Website Coelacanth's Dream located a Github commit that may signal a future configuration to the approaching AMD Aldebaran GPU-based Instinct accelerator. The new GPU, codenamed 'GFX90A," will utilize the CDNA2 architecture, a derivative of the GFX 9th Family structure (Vega structure).

AMD Instinct MI200 Could Feature Two 110 Compute Units CDNA 2 GPU Dies

There are three codes, GFX906_60, GFX908_120, and GFX90A_110, each one specific to a different source. The GFX906_60 is speculated to refer to the Instinct MI60, the GFX908_120 is the Instinct MI100, and the GFX90A_110 may be used for the newer-generation AMD accelerator. With each code, the third part refers to computational units.

For instance, the MI60 will utilize 60 compute units, the MI100 will use 120 units, and the last is to utilize 110 compute units. What is interesting is that the next-gen accelerator from AMD uses fewer computational units than the MI100.

It is stated the Aldebaran GPU will showcase 128 compute units, which does not match with the information received about the next-gen code for the new AMD accelerator. However, any GPU typically will deactivate some of the clusters, which if this is correct, would drop it down to 110 active compute units.

Considering the settings of different Shader Engine and CU, Aldebaran / MI200 is an MCM configuration with 2 GPU dies, so if the setting is symmetric for each die instead of Shader Engine, each die will have 4 SEs. It is possible to have (56 CUs), and disable each one of them to make a total of 110 CUs.

— Coelacanth’s Dream

Website VideoCardz states,

It is unclear if AMD is planning to double the FP32 core count on CDNA2 architecture, but assuming that they do, with a theoretical 1500 MHz GPU clock the accelerator would offer have a single-precision compute performance of 42.2 TFLOPS, 1.82x more than MI100. If that isn’t the case, then MI200 would have to have at least a 1650 MHz clock to reach the same FP32 throughput of 23 TFLOPs.

In the case of HPC accelerators such as MI200, the FP64 performance is far more important. According to previous leaks, MI200 is to feature full-rate FP64 performance, which means either doubling or quadrupling the performance over MI100, depending on the architecture.

AMD's MI200 is set to release before the end of 2021. It is their revolutionary multi-chip graphics processor that is constructed with two active dies and 128 gigabytes of HBM2e memory.

Here's What To Expect From AMD Instinct MI200 'CDNA 2' GPU Accelerator

Inside the AMD Instinct MI200 is an Aldebaran GPU featuring two dies, a secondary and a primary. It has two dies with each consisting of 8 shader engines for a total of 16 SE's. Each Shader Engine packs 16 CUs with full-rate FP64, packed FP32 & a 2nd Generation Matrix Engine for FP16 & BF16 operations. Each die, as such, is composed of 128 compute units or 8192 stream processors. This rounds up to a total of 220 compute units or 14,080 stream processors for the entire chip. The Aldebaran GPU is also powered by a new XGMI interconnect. Each chiplet features a VCN 2.6 engine and the main IO controller.

The block diagram of AMD's CDNA 2 powered Aldebaran GPU which will power the Instinct MI200 HPC accelerator has been visualized. (Image Credits: Locuza)

As for DRAM, AMD has gone with an 8-channel interface consisting of 1024-bit interfaces for an 8192-bit wide bus interface. Each interface can support 2GB HBM2e DRAM modules. This should give us up to 16 GB of HBM2e memory capacity per stack and since there are eight stacks in total, the total amount of capacity would be a whopping 128 GB. That's 48 GB more than the A100 which houses 80 GB HBM2e memory. The full visualization of the Aldebaran GPU on the Instinct MI200 is available here.

AMD Radeon Instinct Accelerators

Accelerator Name	AMD Instinct MI400	AMD Instinct MI350X	AMD Instinct MI300X	AMD Instinct MI300A	AMD Instinct MI250X	AMD Instinct MI250	AMD Instinct MI210	AMD Instinct MI100	AMD Radeon Instinct MI60	AMD Radeon Instinct MI50	AMD Radeon Instinct MI25	AMD Radeon Instinct MI8	AMD Radeon Instinct MI6
CPU Architecture	Zen 5 (Exascale APU)	N/A	N/A	Zen 4 (Exascale APU)	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
GPU Architecture	CDNA 4	CDNA 3+?	Aqua Vanjaram (CDNA 3)	Aqua Vanjaram (CDNA 3)	Aldebaran (CDNA 2)	Aldebaran (CDNA 2)	Aldebaran (CDNA 2)	Arcturus (CDNA 1)	Vega 20	Vega 20	Vega 10	Fiji XT	Polaris 10
GPU Process Node	4nm	4nm	5nm+6nm	5nm+6nm	6nm	6nm	6nm	7nm FinFET	7nm FinFET	7nm FinFET	14nm FinFET	28nm	14nm FinFET
GPU Chiplets	TBD	TBD	8 (MCM)	8 (MCM)	2 (MCM) 1 (Per Die)	2 (MCM) 1 (Per Die)	2 (MCM) 1 (Per Die)	1 (Monolithic)	1 (Monolithic)	1 (Monolithic)	1 (Monolithic)	1 (Monolithic)	1 (Monolithic)
GPU Cores	TBD	TBD	19,456	14,592	14,080	13,312	6656	7680	4096	3840	4096	4096	2304
GPU Clock Speed	TBD	TBD	2100 MHz	2100 MHz	1700 MHz	1700 MHz	1700 MHz	1500 MHz	1800 MHz	1725 MHz	1500 MHz	1000 MHz	1237 MHz
INT8 Compute	TBD	TBD	2614 TOPS	1961 TOPS	383 TOPs	362 TOPS	181 TOPS	92.3 TOPS	N/A	N/A	N/A	N/A	N/A
FP16 Compute	TBD	TBD	1.3 PFLOPs	980.6 TFLOPs	383 TFLOPs	362 TFLOPs	181 TFLOPs	185 TFLOPs	29.5 TFLOPs	26.5 TFLOPs	24.6 TFLOPs	8.2 TFLOPs	5.7 TFLOPs
FP32 Compute	TBD	TBD	163.4 TFLOPs	122.6 TFLOPs	95.7 TFLOPs	90.5 TFLOPs	45.3 TFLOPs	23.1 TFLOPs	14.7 TFLOPs	13.3 TFLOPs	12.3 TFLOPs	8.2 TFLOPs	5.7 TFLOPs
FP64 Compute	TBD	TBD	81.7 TFLOPs	61.3 TFLOPs	47.9 TFLOPs	45.3 TFLOPs	22.6 TFLOPs	11.5 TFLOPs	7.4 TFLOPs	6.6 TFLOPs	768 GFLOPs	512 GFLOPs	384 GFLOPs
VRAM	TBD	HBM3e	192 GB HBM3	128 GB HBM3	128 GB HBM2e	128 GB HBM2e	64 GB HBM2e	32 GB HBM2	32 GB HBM2	16 GB HBM2	16 GB HBM2	4 GB HBM1	16 GB GDDR5
Infinity Cache	TBD	TBD	256 MB	256 MB	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
Memory Clock	TBD	TBD	5.2 Gbps	5.2 Gbps	3.2 Gbps	3.2 Gbps	3.2 Gbps	1200 MHz	1000 MHz	1000 MHz	945 MHz	500 MHz	1750 MHz
Memory Bus	TBD	TBD	8192-bit	8192-bit	8192-bit	8192-bit	4096-bit	4096-bit bus	4096-bit bus	4096-bit bus	2048-bit bus	4096-bit bus	256-bit bus
Memory Bandwidth	TBD	TBD	5.3 TB/s	5.3 TB/s	3.2 TB/s	3.2 TB/s	1.6 TB/s	1.23 TB/s	1 TB/s	1 TB/s	484 GB/s	512 GB/s	224 GB/s
Form Factor	TBD	TBD	OAM	APU SH5 Socket	OAM	OAM	Dual Slot Card	Dual Slot, Full Length	Dual Slot, Full Length	Dual Slot, Full Length	Dual Slot, Full Length	Dual Slot, Half Length	Single Slot, Full Length
Cooling	TBD	TBD	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling
TDP (Max)	TBD	TBD	750W	760W	560W	500W	300W	300W	300W	300W	300W	175W	150W

Source: VideoCardz, ROCm Github, Coelacanth’s Dream

About the author: Jason R. Wilson is a member of the Hardware news team at Wccftech. Equipped with a background in graphic design and writing, Jason works daily to improve his craft and continues to create new and innovative ideas every day.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on AMD Instinct MI200 Speculated to Utilize 110 Compute Units Per MCM GPU

AMD Instinct MI200 Speculated to Utilize 110 Compute Units Per MCM GPU

AMD Instinct MI200 Could Feature Two 110 Compute Units CDNA 2 GPU Dies

Here's What To Expect From AMD Instinct MI200 'CDNA 2' GPU Accelerator

AMD Radeon Instinct Accelerators

Trending Stories

Final Fantasy VII Revelation Brings Back Rebirth’s 8 Regions, but Hamaguchi Insists the World Map Now Feels Truly Substantial

Tesla’s Optimus Humanoid Robot Mass Production Nears As Taiwanese Suppliers Gear Up To Provide Components – Report

Apple Ran Out Of Patience In Arranging An M2 Max MacBook Pro Replacement Part, Customer Receives M5 Max Replacement Plus Store Credit

Apple Will Equip The Outer Panel Of The iPhone Ultra With Samsung’s Cutting-Edge, Natively 10-Bit M16 OLED, While Retaining The Older M14 OLED For The Inner Screen

Kojima Says OD Will Push Horror’s Limits, as Todd Howard Gives TES VI Update and Praises Arkane’s Blade

Popular Discussions

Apple To Design & Build Chips At Intel on American Soil, US President Confirms

AMD Olympic Ridge “Zen 6” Ryzen CPUs Get Integrated NPU At The Cost of iGPU, CUDIMM Ready Platform

NVIDIA Blackwell Sweeps Every MLPerf 6.0 Benchmark With No Competition In Sight, While GB300 Systems Run Up to 60% Faster Than GB200

AMD Reportedly Plots Another 10-15% RX 9000 Price Hike As The RAMpocalypse Swallows The GPU Market

AMD’s Next-Gen Threadripper “Mustang Peak” Confirmed: Built For TR6 Platform, Bringing 2nm Zen 6 Cores and PCIe Gen6

AMD Instinct MI200 Speculated to Utilize 110 Compute Units Per MCM GPU

AMD Instinct MI200 Could Feature Two 110 Compute Units CDNA 2 GPU Dies

Related Story AMD Rolls Out FSR 4.1 For RX 7000 GPUs, Builds a Lightweight ML Model for RDNA 3.5 and RDNA 3 iGPUs

Here's What To Expect From AMD Instinct MI200 'CDNA 2' GPU Accelerator

AMD Radeon Instinct Accelerators

Further Reading

Trending Stories

Popular Discussions