AMD Instinct MI200 CDNA 2 MCM GPU Is A Beast: 1.7 GHz Clocks, 47.9 TFLOPs FP64 & Over 4X Increase In FP64/BF16 Performance Over MI100

•

Oct 23, 2021 at 10:09am EDT

AMD Instinct MI200 CNDA 2 MCM GPU Is A Beast: 1.7 GHz Clocks, 47.9 TFLOPs FP64 & a 4X Increase In FP16 Performance Over MI100

AMD's flagship Instinct MI200 is on the verge of launch and it will be the first GPU for the HPC segment to feature an MCM design based on the CDNA 2 architecture. It looks like the GPU will offer some insane performance numbers compared to the existing Instinct MI100 GPU with a 4x increase in FP16 compute.

AMD Instinct MI200 With CDNA 2 MCM GPU Design Heading To HPC Soon, Features Monstrous Performance Numbers & A 4x Compute Increase Over Instinct MI100

Update: ExecutableFix has posted more information and it looks like the Instinct MI200 lineup will include two variants, a standard MI250, and a MI250X. According to the details, the MI250X will get 110 CUs per die (220 CUs in total), 128 GB HBM2e memory, a 500W TDP and will be based on 7nm.

Enough teasing. MI200 has two variants: MI250 and MI250X

MI250X
110 CUs, 1.7GHz boost
128GB HBM2e
500W TDP, 7nm

— ExecutableFix (@ExecuFix) October 23, 2021

We have got to learn the specifications of the Instinct MI200 accelerator over time but its overall performance figures have remained a mystery until now. Twitter insider and leaker, ExecutableFix, has shared the first performance metrics for AMD's CDNA 2 based MCM GPU accelerator and it's a beast.

1.7GHz boost clock, like you said: very high 😜

— ExecutableFix (@ExecuFix) October 23, 2021

According to tweets by ExecutableFix, the AMD Instinct MI200 will be rocking a clock speed of up to 1.7 GHz which is a 13% increase over the Instinct MI100. The CDNA 2 powered MCM GPU also rocks almost twice the number of stream processors at 14,080 cores, packed within 220 Compute Units. While it was expected that the GPU would rock 240 Compute units with 15,360 cores, the config is replaced by a cut-down variant due to yields. With that said, it is possible that we may see the full SKU launch in the future, offering even higher performance.

383 FP16/BF16

— ExecutableFix (@ExecuFix) October 23, 2021

In terms of performance, the AMD Instinct MI200 HPC Accelerator is going to offer almost 50 TFLOPs (47.9) TFLOPs of FP64 & FP32 compute horsepower. Versus the Instinct MI100, this is a 4.16x increase in the FP64 segement. In fact, the FP64 numbers of the MI200 exceed the FP32 performance of its predecessor. Moving over to the FP16 and BF16 numbers, we are looking at an insane 383 TFLOPs of performance. For perspective, the MI100 only offers 92.3 TFLOPs of peak BFloat16 performance and 184.6 TFLOPs peak FP16 performance.

As per HPCWire, the AMD Instinct MI200 will be powering three top-tier supercomputers which include the United States’ exascale Frontier system; the European Union’s pre-exascale LUMI system; and Australia’s petascale Setonix system. The competition includes the A100 80 GB which offers 19.5 TFLOPs of FP64, 156 TFLOPs of FP32 and 312 TFLOPs of FP16 compute power. But we are likely to hear about NVIDIA's own Hopper MCM GPU next year so there's going to be a heated competition between the two GPU juggernauts in 2022.

Here's What To Expect From AMD Instinct MI200 'CDNA 2' GPU Accelerator

Inside the AMD Instinct MI200 is an Aldebaran GPU featuring two dies, a secondary and a primary. It has two dies with each consisting of 8 shader engines for a total of 16 SE's. Each Shader Engine packs 16 CUs with full-rate FP64, packed FP32 & a 2nd Generation Matrix Engine for FP16 & BF16 operations. Each die, as such, is composed of 128 compute units or 8192 stream processors. This rounds up to a total of 220 compute units or 14,080 stream processors for the entire chip. The Aldebaran GPU is also powered by a new XGMI interconnect. Each chiplet features a VCN 2.6 engine and the main IO controller.

The block diagram of AMD's CDNA 2 powered Aldebaran GPU which will power the Instinct MI200 HPC accelerator has been visualized. (Image Credits: Locuza)

As for DRAM, AMD has gone with an 8-channel interface consisting of 1024-bit interfaces for an 8192-bit wide bus interface. Each interface can support 2GB HBM2e DRAM modules. This should give us up to 16 GB of HBM2e memory capacity per stack and since there are eight stacks in total, the total amount of capacity would be a whopping 128 GB. That's 48 GB more than the A100 which houses 80 GB HBM2e memory. The full visualization of the Aldebaran GPU on the Instinct MI200 is available here.

AMD Radeon Instinct Accelerators

Accelerator Name	AMD Instinct MI400	AMD Instinct MI350X	AMD Instinct MI300X	AMD Instinct MI300A	AMD Instinct MI250X	AMD Instinct MI250	AMD Instinct MI210	AMD Instinct MI100	AMD Radeon Instinct MI60	AMD Radeon Instinct MI50	AMD Radeon Instinct MI25	AMD Radeon Instinct MI8	AMD Radeon Instinct MI6
CPU Architecture	Zen 5 (Exascale APU)	N/A	N/A	Zen 4 (Exascale APU)	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
GPU Architecture	CDNA 4	CDNA 3+?	Aqua Vanjaram (CDNA 3)	Aqua Vanjaram (CDNA 3)	Aldebaran (CDNA 2)	Aldebaran (CDNA 2)	Aldebaran (CDNA 2)	Arcturus (CDNA 1)	Vega 20	Vega 20	Vega 10	Fiji XT	Polaris 10
GPU Process Node	4nm	4nm	5nm+6nm	5nm+6nm	6nm	6nm	6nm	7nm FinFET	7nm FinFET	7nm FinFET	14nm FinFET	28nm	14nm FinFET
GPU Chiplets	TBD	TBD	8 (MCM)	8 (MCM)	2 (MCM) 1 (Per Die)	2 (MCM) 1 (Per Die)	2 (MCM) 1 (Per Die)	1 (Monolithic)	1 (Monolithic)	1 (Monolithic)	1 (Monolithic)	1 (Monolithic)	1 (Monolithic)
GPU Cores	TBD	TBD	19,456	14,592	14,080	13,312	6656	7680	4096	3840	4096	4096	2304
GPU Clock Speed	TBD	TBD	2100 MHz	2100 MHz	1700 MHz	1700 MHz	1700 MHz	1500 MHz	1800 MHz	1725 MHz	1500 MHz	1000 MHz	1237 MHz
INT8 Compute	TBD	TBD	2614 TOPS	1961 TOPS	383 TOPs	362 TOPS	181 TOPS	92.3 TOPS	N/A	N/A	N/A	N/A	N/A
FP16 Compute	TBD	TBD	1.3 PFLOPs	980.6 TFLOPs	383 TFLOPs	362 TFLOPs	181 TFLOPs	185 TFLOPs	29.5 TFLOPs	26.5 TFLOPs	24.6 TFLOPs	8.2 TFLOPs	5.7 TFLOPs
FP32 Compute	TBD	TBD	163.4 TFLOPs	122.6 TFLOPs	95.7 TFLOPs	90.5 TFLOPs	45.3 TFLOPs	23.1 TFLOPs	14.7 TFLOPs	13.3 TFLOPs	12.3 TFLOPs	8.2 TFLOPs	5.7 TFLOPs
FP64 Compute	TBD	TBD	81.7 TFLOPs	61.3 TFLOPs	47.9 TFLOPs	45.3 TFLOPs	22.6 TFLOPs	11.5 TFLOPs	7.4 TFLOPs	6.6 TFLOPs	768 GFLOPs	512 GFLOPs	384 GFLOPs
VRAM	TBD	HBM3e	192 GB HBM3	128 GB HBM3	128 GB HBM2e	128 GB HBM2e	64 GB HBM2e	32 GB HBM2	32 GB HBM2	16 GB HBM2	16 GB HBM2	4 GB HBM1	16 GB GDDR5
Infinity Cache	TBD	TBD	256 MB	256 MB	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
Memory Clock	TBD	TBD	5.2 Gbps	5.2 Gbps	3.2 Gbps	3.2 Gbps	3.2 Gbps	1200 MHz	1000 MHz	1000 MHz	945 MHz	500 MHz	1750 MHz
Memory Bus	TBD	TBD	8192-bit	8192-bit	8192-bit	8192-bit	4096-bit	4096-bit bus	4096-bit bus	4096-bit bus	2048-bit bus	4096-bit bus	256-bit bus
Memory Bandwidth	TBD	TBD	5.3 TB/s	5.3 TB/s	3.2 TB/s	3.2 TB/s	1.6 TB/s	1.23 TB/s	1 TB/s	1 TB/s	484 GB/s	512 GB/s	224 GB/s
Form Factor	TBD	TBD	OAM	APU SH5 Socket	OAM	OAM	Dual Slot Card	Dual Slot, Full Length	Dual Slot, Full Length	Dual Slot, Full Length	Dual Slot, Full Length	Dual Slot, Half Length	Single Slot, Full Length
Cooling	TBD	TBD	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling
TDP (Max)	TBD	TBD	750W	760W	560W	500W	300W	300W	300W	300W	300W	175W	150W

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

AMD Instinct MI200 CDNA 2 MCM GPU Is A Beast: 1.7 GHz Clocks, 47.9 TFLOPs FP64 & Over 4X Increase In FP64/BF16 Performance Over MI100

AMD Instinct MI200 With CDNA 2 MCM GPU Design Heading To HPC Soon, Features Monstrous Performance Numbers & A 4x Compute Increase Over Instinct MI100

Related Story AMD Ryzen With Zen 7 Cores Could Be The Last “Zen” Family For AM5, As Zen 8 Likely Moving To AM6 With DDR6 & PCIe 6.0 Support

Here's What To Expect From AMD Instinct MI200 'CDNA 2' GPU Accelerator

AMD Radeon Instinct Accelerators

Further Reading

20 Years Since AMD's $5.4 Billion ATI Acquisition: The Deal That Shaped Radeon, Consoles, And AMD's AI Future

AMD Turns to Samsung for HBM4 Memory as Lisa Su Races to Match NVIDIA's Rubin

MSI Launches PRO MAX EDGE AI+, Featuring Up To Ryzen AI Max+ 395 For Delivering 126 AI TOPS Of Power

AMD Fires Back At NVIDIA's Groq Bet, Fuses The Cerebras Wafer-Scale Engine With Helios For 5x Higher Tokens Per Second Per Watt