NVIDIA is possibly making its fastest GPU, the Ampere A100, even faster with twice the memory capacity and record-breaking memory bandwidth. This is acknowledged by NVIDIA's own listing that has been discovered by Videocardz.
NVIDIA's Fastest GPU, The Ampere A100, Getting More Faster With Twice The Memory & Higher HBM2e Bandwidth
The existing NVIDIA A100 HPC accelerator was introduced last year in June and it looks like the green team is planning to give it a major spec upgrade. The chip is based on NVIDIA's largest Ampere GPU, the A100, which measures 826mm2 and houses an insane 54 billion transistors. NVIDIA gives its HPC accelerators a spec boost during mid-cycle which means that we will be hearing about the next-generation accelerators at GTC 2022.
In terms of specifications, the A100 PCIe GPU accelerator doesn't change much in terms of core configuration. The GA100 GPU retains the specifications we got to see on the 250W variant with 6912 CUDA cores arranged in 108 SM units, 432 Tensor Cores, and 80 GB of HBM2e memory that delivers higher bandwidth of 2.0 TB/s compared to 1.55 TB/s on the 40 GB variant.
The A100 SMX variant already comes with 80 GB memory but it doesn't feature the faster HBM2e dies like this upcoming A100 PCIe variant. This is also the most amount of memory ever featured on a PCIe-based graphics card but don't expect consumer graphics cards to feature such high capacities any time soon. What's interesting is that the power rating remains unchanged which means that we are looking at higher density chips binned for high-performance use cases.
The FP64 performance is still rated at 9.7/19.5 TFLOPs, FP32 performance is rated at 19.5 /156/312 TFLOPs (Sparsity), FP16 performance is rated at 312/624 TFLOPs (Sparsity) and the INT8 is rated at 624/1248 TOPs (Sparsity). NVIDIA is planning to release its latest HPC accelerator next week and we can also expect the pricing of over $20,000 US considering the 40 GB A100 variant sells for around $15,000 US.
NVIDIA HPC / AI GPUs
| NVIDIA Tesla Graphics Card | NVIDIA B200 | NVIDIA H200 (SXM5) | NVIDIA H100 (SMX5) | NVIDIA H100 (PCIe) | NVIDIA A100 (SXM4) | NVIDIA A100 (PCIe4) | Tesla V100S (PCIe) | Tesla V100 (SXM2) | Tesla P100 (SXM2) | Tesla P100 (PCI-Express) | Tesla M40 (PCI-Express) | Tesla K40 (PCI-Express) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GPU | B200 | H200 (Hopper) | H100 (Hopper) | H100 (Hopper) | A100 (Ampere) | A100 (Ampere) | GV100 (Volta) | GV100 (Volta) | GP100 (Pascal) | GP100 (Pascal) | GM200 (Maxwell) | GK110 (Kepler) |
| Process Node | 4nm | 4nm | 4nm | 4nm | 7nm | 7nm | 12nm | 12nm | 16nm | 16nm | 28nm | 28nm |
| Transistors | 208 Billion | 80 Billion | 80 Billion | 80 Billion | 54.2 Billion | 54.2 Billion | 21.1 Billion | 21.1 Billion | 15.3 Billion | 15.3 Billion | 8 Billion | 7.1 Billion |
| GPU Die Size | TBD | 814mm2 | 814mm2 | 814mm2 | 826mm2 | 826mm2 | 815mm2 | 815mm2 | 610 mm2 | 610 mm2 | 601 mm2 | 551 mm2 |
| SMs | 160 | 132 | 132 | 114 | 108 | 108 | 80 | 80 | 56 | 56 | 24 | 15 |
| TPCs | 80 | 66 | 66 | 57 | 54 | 54 | 40 | 40 | 28 | 28 | 24 | 15 |
| L2 Cache Size | TBD | 51200 KB | 51200 KB | 51200 KB | 40960 KB | 40960 KB | 6144 KB | 6144 KB | 4096 KB | 4096 KB | 3072 KB | 1536 KB |
| FP32 CUDA Cores Per SM | TBD | 128 | 128 | 128 | 64 | 64 | 64 | 64 | 64 | 64 | 128 | 192 |
| FP64 CUDA Cores / SM | TBD | 128 | 128 | 128 | 32 | 32 | 32 | 32 | 32 | 32 | 4 | 64 |
| FP32 CUDA Cores | TBD | 16896 | 16896 | 14592 | 6912 | 6912 | 5120 | 5120 | 3584 | 3584 | 3072 | 2880 |
| FP64 CUDA Cores | TBD | 16896 | 16896 | 14592 | 3456 | 3456 | 2560 | 2560 | 1792 | 1792 | 96 | 960 |
| Tensor Cores | TBD | 528 | 528 | 456 | 432 | 432 | 640 | 640 | N/A | N/A | N/A | N/A |
| Texture Units | TBD | 528 | 528 | 456 | 432 | 432 | 320 | 320 | 224 | 224 | 192 | 240 |
| Boost Clock | TBD | ~1850 MHz | ~1850 MHz | ~1650 MHz | 1410 MHz | 1410 MHz | 1601 MHz | 1530 MHz | 1480 MHz | 1329MHz | 1114 MHz | 875 MHz |
| TOPs (DNN/AI) | 20,000 TOPs | 3958 TOPs | 3958 TOPs | 3200 TOPs | 2496 TOPs | 2496 TOPs | 130 TOPs | 125 TOPs | N/A | N/A | N/A | N/A |
| FP16 Compute | 10,000 TFLOPs | 1979 TFLOPs | 1979 TFLOPs | 1600 TFLOPs | 624 TFLOPs | 624 TFLOPs | 32.8 TFLOPs | 30.4 TFLOPs | 21.2 TFLOPs | 18.7 TFLOPs | N/A | N/A |
| FP32 Compute | 90 TFLOPs | 67 TFLOPs | 67 TFLOPs | 800 TFLOPs | 156 TFLOPs (19.5 TFLOPs standard) | 156 TFLOPs (19.5 TFLOPs standard) | 16.4 TFLOPs | 15.7 TFLOPs | 10.6 TFLOPs | 10.0 TFLOPs | 6.8 TFLOPs | 5.04 TFLOPs |
| FP64 Compute | 45 TFLOPs | 34 TFLOPs | 34 TFLOPs | 48 TFLOPs | 19.5 TFLOPs (9.7 TFLOPs standard) | 19.5 TFLOPs (9.7 TFLOPs standard) | 8.2 TFLOPs | 7.80 TFLOPs | 5.30 TFLOPs | 4.7 TFLOPs | 0.2 TFLOPs | 1.68 TFLOPs |
| Memory Interface | 8192-bit HBM4 | 5120-bit HBM3e | 5120-bit HBM3 | 5120-bit HBM2e | 6144-bit HBM2e | 6144-bit HBM2e | 4096-bit HBM2 | 4096-bit HBM2 | 4096-bit HBM2 | 4096-bit HBM2 | 384-bit GDDR5 | 384-bit GDDR5 |
| Memory Size | Up To 192 GB HBM3 @ 8.0 Gbps | Up To 141 GB HBM3e @ 6.5 Gbps | Up To 80 GB HBM3 @ 5.2 Gbps | Up To 94 GB HBM2e @ 5.1 Gbps | Up To 40 GB HBM2 @ 1.6 TB/s Up To 80 GB HBM2 @ 1.6 TB/s | Up To 40 GB HBM2 @ 1.6 TB/s Up To 80 GB HBM2 @ 2.0 TB/s | 16 GB HBM2 @ 1134 GB/s | 16 GB HBM2 @ 900 GB/s | 16 GB HBM2 @ 732 GB/s | 16 GB HBM2 @ 732 GB/s 12 GB HBM2 @ 549 GB/s | 24 GB GDDR5 @ 288 GB/s | 12 GB GDDR5 @ 288 GB/s |
| TDP | 700W | 700W | 700W | 350W | 400W | 250W | 250W | 300W | 300W | 250W | 250W | 235W |
Follow Wccftech on Google to get more of our news coverage in your feeds.
