NVIDIA Readies Ampere A100 PCIe GPU With 80 GB HBM2e Memory & Up To 2 TB/s Bandwidth

Submit

NVIDIA is possibly making its fastest GPU, the Ampere A100, even faster with twice the memory capacity and record-breaking memory bandwidth. This is acknowledged by NVIDIA's own listing that has been discovered by Videocardz.

NVIDIA's Fastest GPU, The Ampere A100, Getting More Faster With Twice The Memory & Higher HBM2e Bandwidth

The existing NVIDIA A100 HPC accelerator was introduced last year in June and it looks like the green team is planning to give it a major spec upgrade. The chip is based on NVIDIA's largest Ampere GPU, the A100, which measures 826mm2 and houses an insane 54 billion transistors. NVIDIA gives its HPC accelerators a spec boost during mid-cycle which means that we will be hearing about the next-generation accelerators at GTC 2022.

Custom GALAX & Gainward GeForce RTX 3060 Cards With NVIDIA Ampere GA104 GPUs Listed

In terms of specifications, the A100 PCIe GPU accelerator doesn't change much in terms of core configuration. The GA100 GPU retains the specifications we got to see on the 250W variant with 6912 CUDA cores arranged in 108 SM units, 432 Tensor Cores, and 80 GB of HBM2e memory that delivers higher bandwidth of 2.0 TB/s compared to 1.55 TB/s on the 40 GB variant.

A featured image of the NVIDIA GA100 die.

The A100 SMX variant already comes with 80 GB memory but it doesn't feature the faster HBM2e dies like this upcoming A100 PCIe variant. This is also the most amount of memory ever featured on a PCIe-based graphics card but don't expect consumer graphics cards to feature such high capacities any time soon. What's interesting is that the power rating remains unchanged which means that we are looking at higher density chips binned for high-performance use cases.

Specifications of the A100 PCIe 80 GB graphics card as listed over at NVIDIA's webpage. (Image Credits: Videocardz)

The FP64 performance is still rated at 9.7/19.5 TFLOPs, FP32 performance is rated at 19.5 /156/312 TFLOPs (Sparsity), FP16 performance is rated at 312/624 TFLOPs (Sparsity) and the INT8 is rated at 624/1248 TOPs (Sparsity). NVIDIA is planning to release its latest HPC accelerator next week and we can also expect the pricing of over $20,000 US considering the 40 GB A100 variant sells for around $15,000 US.

NVIDIA Ampere GA100 GPU Based Tesla A100 Specs:

NVIDIA Tesla Graphics CardTesla K40
(PCI-Express)
Tesla M40
(PCI-Express)
Tesla P100
(PCI-Express)
Tesla P100 (SXM2)Tesla V100 (SXM2)Tesla V100S (PCIe)NVIDIA A100 (SXM4)NVIDIA A100 (PCIe4)
GPUGK110 (Kepler)GM200 (Maxwell)GP100 (Pascal)GP100 (Pascal)GV100 (Volta)GV100 (Volta)GA100 (Ampere)GA100 (Ampere)
Process Node28nm28nm16nm16nm12nm12nm7nm7nm
Transistors7.1 Billion8 Billion15.3 Billion15.3 Billion21.1 Billion21.1 Billion54.2 Billion54.2 Billion
GPU Die Size551 mm2601 mm2610 mm2610 mm2815mm2815mm2826mm2826mm2
SMs152456568080108108
TPCs1524282840405454
FP32 CUDA Cores Per SM192128646464646464
FP64 CUDA Cores / SM644323232323232
FP32 CUDA Cores28803072358435845120512069126912
FP64 CUDA Cores96096179217922560256034563456
Tensor CoresN/AN/AN/AN/A640640432432
Texture Units240192224224320320432432
Boost Clock875 MHz1114 MHz1329MHz1480 MHz1530 MHz1601 MHz1410 MHz1410 MHz
TOPs (DNN/AI)N/AN/AN/AN/A125 TOPs130 TOPs1248 TOPs
2496 TOPs with Sparsity
1248 TOPs
2496 TOPs with Sparsity
FP16 ComputeN/AN/A18.7 TFLOPs21.2 TFLOPs30.4 TFLOPs32.8 TFLOPs312 TFLOPs
624 TFLOPs with Sparsity
312 TFLOPs
624 TFLOPs with Sparsity
FP32 Compute5.04 TFLOPs6.8 TFLOPs10.0 TFLOPs10.6 TFLOPs15.7 TFLOPs16.4 TFLOPs156 TFLOPs
(19.5 TFLOPs standard)
156 TFLOPs
(19.5 TFLOPs standard)
FP64 Compute1.68 TFLOPs0.2 TFLOPs4.7 TFLOPs5.30 TFLOPs7.80 TFLOPs8.2 TFLOPs19.5 TFLOPs
(9.7 TFLOPs standard)
19.5 TFLOPs
(9.7 TFLOPs standard)
Memory Interface384-bit GDDR5384-bit GDDR54096-bit HBM24096-bit HBM24096-bit HBM24096-bit HBM26144-bit HBM2e6144-bit HBM2e
Memory Size12 GB GDDR5 @ 288 GB/s24 GB GDDR5 @ 288 GB/s16 GB HBM2 @ 732 GB/s
12 GB HBM2 @ 549 GB/s
16 GB HBM2 @ 732 GB/s16 GB HBM2 @ 900 GB/s16 GB HBM2 @ 1134 GB/sUp To 40 GB HBM2 @ 1.6 TB/s
Up To 80 GB HBM2 @ 1.6 TB/s
Up To 40 GB HBM2 @ 1.6 TB/s
Up To 80 GB HBM2 @ 2.0 TB/s
L2 Cache Size1536 KB3072 KB4096 KB4096 KB6144 KB6144 KB40960 KB40960 KB
TDP235W250W250W300W300W250W400W250W
Submit