AMD Radeon Instinct MI100 ‘CDNA GPU’ Alleged Performance Numbers Show Its Faster Than NVIDIA’s A100 in FP32 Compute, Impressive Perf/Value

•

Jul 29, 2020 at 10:48pm EDT

Alleged performance numbers and details of AMD's next-generation CDNA GPU based Radeon Instinct MI100 accelerator have leaked out by AdoredTV. In an exclusive post, AdoredTV covers performance benchmarks of the upcoming HPC GPU against NVIDIA's Volta and Ampere GPUs.

AMD Radeon Instinct MI100 'CDNA' GPU Performance Benchmarks Leak Out, Allegedly Faster Than NVIDIA's Ampere A100 In FP32 Compute With Better Perf/Value

AdoredTV claims that the slides they have received are from the official AMD Radeon Instinct MI100 presentation. The ones posted on the source seem to be modified versions of the original ones but details are kept intact. In our previous post, we confirmed that the Radeon Instinct MI100 GPU was on its way to the market in 2H 2020. The slides from AdoredTV shed some more light on the launch plans and server configurations that we could expect from AMD and its partners in 2020 & beyond.

AMD Radeon Instinct MI100 1U Server Specs

First up, AMD is planning to unveil an HPC specific server which would feature 2P design with dual AMD EPYC CPUs that could either be based on the Rome or Milan generation. Each EPYC CPU will be connected to two Radeon Instinct MI100 accelerators through the 2nd Generation Infinity Fabric interconnect. The four GPUs will be able to deliver a sustained 136 TFLOPs of FP32 (SGEMM) output which points out to around 34 TFLOPs of FP32 compute per GPU. Each Radeon Instinct MI100 GPU will have a TDP of 300W.

Additional specifications include total GPU PCIe bandwidth of 256 GB/s which is made possible on the Gen 4 protocol. The combined memory bandwidth of the four GPUs is at 4.9 TB/s which means that AMD is using HBM2e DRAM dies (each GPU pumps out 1.225 TB/s bandwidth). The combined memory pool is 128 GB or 32 GB per GPU. This suggests that AMD is still using 4 HBM2 DRAM stack technology and each stack housing 8-hi DRAM dies. It looks like XGMI won't be offered on standard configurations and will be kept limited to specialized 1U racks.

As far as availability is concerned, the 1U server with AMD EPYC (Rome / Milan) HPC CPUs is said to launch by December 2020 while an Intel Xeon variant is also expected to launch in February 2021.

AMD Radeon Instinct MI100 3U Server Specs

The second 3U server is expected to launch in March 2021 and will offer even beefier specifications such as 8 Radeon Instinct MI100 GPUs connected to two EPYC CPUs. Each group of four Instinct MI 100's will be connected together through an XGMI (100 GB/s bi-directional) and a quad bandwidth of 1.2 TB/s. The four Instinct accelerators would equal a total of 272 TFLOPs of FP32 compute, 512 GB per second PCIe bandwidth, 9.8 TB/s HBM bandwidth, and 256 GB of memory DRAM capacity. The rack will have a rated power draw of 3kW.

AMD Radeon Instinct Accelerators

Accelerator Name	AMD Instinct MI400	AMD Instinct MI350X	AMD Instinct MI300X	AMD Instinct MI300A	AMD Instinct MI250X	AMD Instinct MI250	AMD Instinct MI210	AMD Instinct MI100	AMD Radeon Instinct MI60	AMD Radeon Instinct MI50	AMD Radeon Instinct MI25	AMD Radeon Instinct MI8	AMD Radeon Instinct MI6
CPU Architecture	Zen 5 (Exascale APU)	N/A	N/A	Zen 4 (Exascale APU)	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
GPU Architecture	CDNA 4	CDNA 3+?	Aqua Vanjaram (CDNA 3)	Aqua Vanjaram (CDNA 3)	Aldebaran (CDNA 2)	Aldebaran (CDNA 2)	Aldebaran (CDNA 2)	Arcturus (CDNA 1)	Vega 20	Vega 20	Vega 10	Fiji XT	Polaris 10
GPU Process Node	4nm	4nm	5nm+6nm	5nm+6nm	6nm	6nm	6nm	7nm FinFET	7nm FinFET	7nm FinFET	14nm FinFET	28nm	14nm FinFET
GPU Chiplets	TBD	TBD	8 (MCM)	8 (MCM)	2 (MCM) 1 (Per Die)	2 (MCM) 1 (Per Die)	2 (MCM) 1 (Per Die)	1 (Monolithic)	1 (Monolithic)	1 (Monolithic)	1 (Monolithic)	1 (Monolithic)	1 (Monolithic)
GPU Cores	TBD	TBD	19,456	14,592	14,080	13,312	6656	7680	4096	3840	4096	4096	2304
GPU Clock Speed	TBD	TBD	2100 MHz	2100 MHz	1700 MHz	1700 MHz	1700 MHz	1500 MHz	1800 MHz	1725 MHz	1500 MHz	1000 MHz	1237 MHz
INT8 Compute	TBD	TBD	2614 TOPS	1961 TOPS	383 TOPs	362 TOPS	181 TOPS	92.3 TOPS	N/A	N/A	N/A	N/A	N/A
FP16 Compute	TBD	TBD	1.3 PFLOPs	980.6 TFLOPs	383 TFLOPs	362 TFLOPs	181 TFLOPs	185 TFLOPs	29.5 TFLOPs	26.5 TFLOPs	24.6 TFLOPs	8.2 TFLOPs	5.7 TFLOPs
FP32 Compute	TBD	TBD	163.4 TFLOPs	122.6 TFLOPs	95.7 TFLOPs	90.5 TFLOPs	45.3 TFLOPs	23.1 TFLOPs	14.7 TFLOPs	13.3 TFLOPs	12.3 TFLOPs	8.2 TFLOPs	5.7 TFLOPs
FP64 Compute	TBD	TBD	81.7 TFLOPs	61.3 TFLOPs	47.9 TFLOPs	45.3 TFLOPs	22.6 TFLOPs	11.5 TFLOPs	7.4 TFLOPs	6.6 TFLOPs	768 GFLOPs	512 GFLOPs	384 GFLOPs
VRAM	TBD	HBM3e	192 GB HBM3	128 GB HBM3	128 GB HBM2e	128 GB HBM2e	64 GB HBM2e	32 GB HBM2	32 GB HBM2	16 GB HBM2	16 GB HBM2	4 GB HBM1	16 GB GDDR5
Infinity Cache	TBD	TBD	256 MB	256 MB	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
Memory Clock	TBD	TBD	5.2 Gbps	5.2 Gbps	3.2 Gbps	3.2 Gbps	3.2 Gbps	1200 MHz	1000 MHz	1000 MHz	945 MHz	500 MHz	1750 MHz
Memory Bus	TBD	TBD	8192-bit	8192-bit	8192-bit	8192-bit	4096-bit	4096-bit bus	4096-bit bus	4096-bit bus	2048-bit bus	4096-bit bus	256-bit bus
Memory Bandwidth	TBD	TBD	5.3 TB/s	5.3 TB/s	3.2 TB/s	3.2 TB/s	1.6 TB/s	1.23 TB/s	1 TB/s	1 TB/s	484 GB/s	512 GB/s	224 GB/s
Form Factor	TBD	TBD	OAM	APU SH5 Socket	OAM	OAM	Dual Slot Card	Dual Slot, Full Length	Dual Slot, Full Length	Dual Slot, Full Length	Dual Slot, Full Length	Dual Slot, Half Length	Single Slot, Full Length
Cooling	TBD	TBD	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling	Passive Cooling
TDP (Max)	TBD	TBD	750W	760W	560W	500W	300W	300W	300W	300W	300W	175W	150W

AMD's Radeon Instinct MI100 'CDNA GPU' Performance Numbers, An FP32 Powerhouse In The Making?

In terms of performance, the AMD Radeon Instinct MI100 was compared to the NVIDIA Volta V100 and the NVIDIA Ampere A100 GPU accelerators. Interestingly, the slides mention a 300W Ampere A100 accelerator although no such configuration exists which means that these slides are based on a hypothesized A100 configuration rather than an actual variant which comes in two flavors, the 400W config in the SXM form factor and the 250W config which comes in the PCIe form factor.

As per the benchmarks, the Radeon Instinct MI100 delivers around 13% better FP32 performance versus the Ampere A100 and over 2x performance increase versus the Volta V100 GPUs. The perf to value ratio is also compared with the MI100 offering around 2.4x better value compared to the V100S and 50% better value than the Ampere A100. It is also shown that the performance scaling is near-linear even with up to 32 GPU configurations in Resenet which is quite impressive.

AMD Radeon Instinct MI100 vs NVIDIA's Ampere A100 HPC Accelerator (Image Credits: AdoredTV):

With that said, the slides also mention that AMD will offer much better performance and value in three specific segments which include Oil & Gas, Academia, and HPC & Machine Learning. In the rest of the HPC workloads such as FP64 compute, AI, and Data Analytics, NVIDIA will offer much superior performance with its A100 accelerator. NVIDIA also holds the benefit of Multi-Instance GPU architecture over AMD. The performance metrics show 2.5x better FP64 performance, 2x better FP16 performance, and twice the tensor performance thanks to the latest gen Tensor cores on the Ampere A100 GPU.

One thing that needs to be highlighted is that AMD hasn't mentioned NVIDIA's sparsity numbers anywhere in the benchmarks. With sparsity, NVIDIA's Ampere A100 boasts up to 156 TFLOPs of horsepower though it seems like AMD just wanted to do a specific benchmark comparison versus the Ampere A100. From the looks of it, the Radeon Instinct MI100 does seem to be a decent HPC offering if the performance and value numbers hold up at launch.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

AMD Radeon Instinct MI100 ‘CDNA GPU’ Alleged Performance Numbers Show Its Faster Than NVIDIA’s A100 in FP32 Compute, Impressive Perf/Value

AMD Radeon Instinct MI100 'CDNA' GPU Performance Benchmarks Leak Out, Allegedly Faster Than NVIDIA's Ampere A100 In FP32 Compute With Better Perf/Value

Related Story ASUS Rolls Out New BIOS Update For 600 And 800 Series AMD Motherboards, Enhancing Compatibility With CXMT Memory

AMD Radeon Instinct Accelerators

AMD's Radeon Instinct MI100 'CDNA GPU' Performance Numbers, An FP32 Powerhouse In The Making?

Further Reading

AMD Prepares For Ryzen AI MAX PRO 400 Launch With ROCm 7.14 Support

AMD Ryzen 7 7800X3D CPU Drops To $299 A Day Ahead of 7700X3D's Launch, Bringing 3D V-Cache Goodness To Mainstream Gamers

MSI Charged A Customer For A Bent-Pin Repair, Then Returned The Motherboard With Another Pin Still Bent; Apologizes Later

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker