AMD Instinct MI200 CDNA 2 ‘Aldebaran’ GPU Die Visualized – Up To 256 Compute Units, 8192-bit Memory Bus, 128 GB HBM2e Capacity

Hassan Mujtaba • Jul 1, 2021 at 10:48am EDT

AMD Instinct MI200 CDNA 2 'Aldebaran' MCM HPC GPU Accelerator Launching Later This Year

AMD's CDNA 2 powered Aldebaran GPU is headed for launch in the Instinct MI200 accelerator later this year. The GPU is going to feature an MCM design and will carry a massive amount of cores and memory. With all that we know about the chip so far, Twitter fellow, Locuza, has come up with a die block diagram of the full Aldebaran GPU and it looks like a beast.

AMD Aldebaran CDNA 2 GPU For Instinct MI200 HPC Accelerator Visualized - Up To 256 Compute Units, 128 GB HBM2e Memory

The block diagram is based on the latest details shared by Kepler_L2 for the CDNA 2 powered GPU. It was already confirmed that the Aldebaran GPU powering the Instinct MI200 will feature two dies, a secondary and a primary. The block diagram shows the two dies with each consisting of 8 shader engines for a total of 16 SE's. Each Shader Engine packs 16 CUs with full-rate FP64, packed FP32 & a 2nd Generation Matrix Engine for FP16 & BF16 operations.

The block diagram of AMD's CDNA 2 powered Aldebaran GPU which will power the Instinct MI200 HPC accelerator has been visualized. (Image Credits: Locuza)

Each die, as such, is composed of 128 compute units or 8192 stream processors. This rounds up to a total of 256 compute units or 16,384 stream processors for the entire chip. The Aldebaran GPU is also powered by a new XGMI interconnect. Each chiplet features a VCN 2.6 engine and the main IO controller.

The full config of Aldebaran is 2 dies x 8 Shader Engines x 16 Compute Units, but MI200 might be only 224 CUs enabled 🧐 https://t.co/RO7RptgYSk

— Kepler (@Kepler_L2) June 10, 2021

One interesting change relates to the vector units supporting full rate FP64 and packed FP32 ops.
64b DPP (Data-Parallel Primitives) were added.
The Matrix Units also support FP64 now, though I'm not sure if you get higher throughput vs. vector ALUs.
Both might do 128 ops/clk.

— Locuza (@Locuza_) July 1, 2021

Moving over to DRAM, AMD has gone with an 8-channel interface consisting of 1024-bit interfaces for an 8192-bit wide bus interface. Each interface can support 2GB HBM2e DRAM modules. This should give us up to 16 GB of HBM2e memory capacity per stack and since there are eight stacks in total, the total amount of capacity would be a whopping 128 GB. That's 48 GB more than the A100 which houses 80 GB HBM2e memory. This would really be a juggernaut of an HPC GPU but we also expect some really high power figures when it launches.

As for the product itself, Kepler_L2 states that the actual AMD Instinct MI200 accelerator will utilize a cut-down configuration comprising 224 CUs or 14,336 cores. That's around 14% lower cores than what the full Aldebaran GPU die has to offer.

Here's Everything We Know About AMD's CDNA 2 Architecture Powered Instinct Accelerators

The AMD CDNA 2 architecture will be powering the next-generation AMD Instinct HPC accelerators. We know that one of those accelerators will be the MI200 which will feature the Aldebaran GPU. It's going to be a very powerful chip and possibly the first GPU to feature an MCM design. The Instinct MI200 is going to compete against Intel's 7nm Ponte Vecchio and NVIDIA's refreshed Ampere parts. Intel and NVIDIA are also following the MCM route on their next-generation HPC accelerators but it looks like Ponte Vecchio is going to be available in 2022 and the same can be said for NVIDIA's next-gen HPC accelerator as their own roadmap confirmed.

In the previous Linux patch, it was revealed that l that the AMD Instinct MI200 'Aldebaran' GPU will feature HBM2E memory support. NVIDIA was the first to hop on board the HBM2E standard and will offer a nice boost over the standard HBM2 configuration used on the Arcturus-based MI100 GPU accelerator.

The latest Linux Kernel Patch revealed that the GPU carries 16 KB of L1 cache per CU which makes up 2 MB of the total L1 cache considering that each GPU will be packing 128 Compute Units. The GPU also carries 8 MB of shared L2 cache but carries 14 CUs per Shader Engine compared to 16 CUs per SE in the previous Instinct lineup. Regardless, it is stated that each CU on Aldebaran GPUs will have a significantly higher computing output.

Other features listed include SDMA (System Direct Memory Access) support which will allow data transfers over PCIe and XGMI/Infinity Cache subsystems. As far as Infinity Cache is concerned, it's looking like that won't be happening on HPC GPUs. Do note that AMD's CDNA 2 GPU will be fabricated on a brand new process node & are confirmed to feature a 3rd Generation AMD Infinity architecture that extends to Exascale by allowing up to 8-Way coherent GPU connectivity.

AMD Radeon Instinct Accelerators

Accelerator Name	AMD Instinct MI400	AMD Instinct MI350X	AMD Instinct MI300X	AMD Instinct MI300A	AMD Instinct MI250X	AMD Instinct MI250	AMD Instinct MI210	AMD Instinct MI100	AMD Radeon Instinct MI60	AMD Radeon Instinct MI50	AMD Radeon Instinct MI25	AMD Radeon Instinct MI8	AMD Radeon Instinct MI6
CPU Architecture	Zen 5 (Exascale APU)	N/A	N/A	Zen 4 (Exascale APU)	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
GPU Architecture	CDNA 4	CDNA 3+?	Aqua Vanjaram (CDNA 3)	Aqua Vanjaram (CDNA 3)	Aldebaran (CDNA 2)	Aldebaran (CDNA 2)	Aldebaran (CDNA 2)	Arcturus (CDNA 1)	Vega 20	Vega 20	Vega 10	Fiji XT	Polaris 10
GPU Process Node	4nm	4nm	5nm+6nm	5nm+6nm	6nm	6nm	6nm	7nm FinFET	7nm FinFET	7nm FinFET	14nm FinFET	28nm	14nm FinFET
GPU Chiplets	TBD	TBD	8 (MCM)	8 (MCM)	2 (MCM) 1 (Per Die)	2 (MCM) 1 (Per Die)	2 (MCM) 1 (Per Die)	1 (Monolithic)	1 (Monolithic)	1 (Monolithic)	1 (Monolithic)	1 (Monolithic)	1 (Monolithic)
GPU Cores	TBD	TBD	19,456	14,592	14,080	13,312	6656	7680	4096	3840	4096	4096	2304
GPU Clock Speed	TBD	TBD	2100 MHz	2100 MHz	1700 MHz	1700 MHz	1700 MHz	1500 MHz	1800 MHz	1725 MHz	1500 MHz	1000 MHz	1237 MHz
INT8 Compute	TBD	TBD	2614 TOPS	1961 TOPS	383 TOPs	362 TOPS	181 TOPS	92.3 TOPS	N/A	N/A	N/A	N/A	N/A
FP16 Compute	TBD	TBD	1.3 PFLOPs	980.6 TFLOPs	383 TFLOPs	362 TFLOPs	181 TFLOPs	185 TFLOPs	29.5 TFLOPs	26.5 TFLOPs	24.6 TFLOPs	8.2 TFLOPs	5.7 TFLOPs
FP32 Compute	TBD	TBD	163.4 TFLOPs	122.6 TFLOPs	95.7 TFLOPs	90.5 TFLOPs	45.3 TFLOPs	23.1 TFLOPs	14.7 TFLOPs	13.3 TFLOPs	12.3 TFLOPs	8.2 TFLOPs	5.7 TFLOPs
FP64 Compute	TBD	TBD	81.7 TFLOPs	61.3 TFLOPs	47.9 TFLOPs	45.3 TFLOPs	22.6 TFLOPs	11.5 TFLOPs	7.4 TFLOPs	6.6 TFLOPs	768 GFLOPs	512 GFLOPs	384 GFLOPs
VRAM	TBD	HBM3e	192 GB HBM3	128 GB HBM3	128 GB HBM2e	128 GB HBM2e	64 GB HBM2e	32 GB HBM2	32 GB HBM2	16 GB HBM2	16 GB HBM2	4 GB HBM1	16 GB GDDR5
Infinity Cache	TBD	TBD	256 MB	256 MB	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
Memory Clock	TBD	TBD	5.2 Gbps	5.2 Gbps	3.2 Gbps	3.2 Gbps	3.2 Gbps	1200 MHz	1000 MHz	1000 MHz	945 MHz	500 MHz	1750 MHz
Memory Bus	TBD	TBD	8192-bit	8192-bit	8192-bit	8192-bit	4096-bit	4096-bit bus	4096-bit bus	4096-bit bus	2048-bit bus	4096-bit bus	256-bit bus
Memory Bandwidth	TBD	TBD	5.3 TB/s	5.3 TB/s	3.2 TB/s	3.2 TB/s	1.6 TB/s	1.23 TB/s	1 TB/s	1 TB/s	484 GB/s	512 GB/s	224 GB/s
Form Factor	TBD	TBD	OAM	APU SH5 Socket	OAM	OAM	Dual Slot Card	Dual Slot, Full Length	Dual Slot, Full Length	Dual Slot, Full Length	Dual Slot, Full Length	Dual Slot, Half Length	Single Slot, Full Length
Cooling	TBD	TBD	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling
TDP (Max)	TBD	TBD	750W	760W	560W	500W	300W	300W	300W	300W	300W	175W	150W

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on AMD Instinct MI200 CDNA 2 ‘Aldebaran’ GPU Die Visualized – Up To 256 Compute Units, 8192-bit Memory Bus, 128 GB HBM2e Capacity

AMD Instinct MI200 CDNA 2 ‘Aldebaran’ GPU Die Visualized – Up To 256 Compute Units, 8192-bit Memory Bus, 128 GB HBM2e Capacity

AMD Aldebaran CDNA 2 GPU For Instinct MI200 HPC Accelerator Visualized - Up To 256 Compute Units, 128 GB HBM2e Memory

Here's Everything We Know About AMD's CDNA 2 Architecture Powered Instinct Accelerators

AMD Radeon Instinct Accelerators

Trending Stories

Keychron Expands Beyond Keyboards By Launching Its First 14-in-1 Thunderbolt 5 Docking Station

Bethesda Pivots to a Franchise-First Model as Xbox Cuts 3,200 Jobs, Betting Everything on The Elder Scrolls and Fallout

Intel Races TSMC and Samsung on 1.4A2, but Shrinking 21nm Interconnects Force a Power-Delivery Rethink

SSD Shortage Due To AI Encouraged An Engineer To Upgrade A MacBook Pro’s Storage To 8TB, But The NAND’s $800 Price And Hours Of Work Will Make Anyone Nervous

Square Enix’s Final Fantasy VII Rebirth Remastered Lighting Drops On PC This Week, as Modder Transforms Game To Another Level

Popular Discussions

Intel’s Shot At Fabricating Apple’s A20 Chip For The Base iPhone 18 Collapses As A Credible Leaker Calls The Original Source A ‘Blowhard’

NVIDIA’s RTX 3060 12 GB Graphics Card Comeback Proves Just How Bad Things Are For The PC Gaming Market

Intel Expected To Restart Supply Of 10th, 12th, 13th, And 14th Gen Processors In Mainland China

Intel Cites Rising Supply Chain Costs As The Reason For Raising Prices Of Intel Core Ultra 200S Plus Processors

Sony Just Killed the Disc for PlayStation 6, and Microsoft’s “Project Helix” Xbox Is Reportedly Following

AMD Instinct MI200 CDNA 2 ‘Aldebaran’ GPU Die Visualized – Up To 256 Compute Units, 8192-bit Memory Bus, 128 GB HBM2e Capacity

AMD Aldebaran CDNA 2 GPU For Instinct MI200 HPC Accelerator Visualized - Up To 256 Compute Units, 128 GB HBM2e Memory

Related Story AMD RX 7900 XTX Prototype Surfaces With Red PCB, But GPU-Z Reads Just 16 GB Instead Of 24 GB

Here's Everything We Know About AMD's CDNA 2 Architecture Powered Instinct Accelerators

AMD Radeon Instinct Accelerators

Further Reading

Trending Stories

Popular Discussions