NVIDIA Ampere GA100 GPU Rumored Specifications Detailed – 8192 CUDA Cores, Up To 48 GB HBM2e Memory, Up To 2.2 GHz Clocks & 300W TDP


As NVIDIA's GTC 2020 closes in, new specifications of the Ampere GA100 GPU have been leaked which once again shows that the next-gen GPU architecture from the green team is going to be an absolute beast of a Compute powerhouse.

NVIDIA Ampere GPU Specifications Rumored To Include 8192 CUDA Cores, Up To 48 GB HBM2e Memory & Core Clocks Beyond 2 GHz On The Flagship GA100 Chip

The latest specifications come from the Stage1 Chinese forums where a user who's know to post leaks before has listed down key details for the flagship Ampere GPU, the GA100. NVIDIA's Ampere GPU family has been known for a while now but it is something that NVIDIA has yet to reveal to the public. There are several GPUs of the Ampere family that have appeared in various leaks such as the GA100 itself but there hasn't been any conclusive evidence if Ampere is the name of the family of GPUs which NVIDIA is going to introduce next for the HPC / Data Center segment.

NVIDIA’s Entry-Level GeForce RTX 3050 Rumored To Utilize Ampere GA107 GPU With 2304 Cores & 90W TGP

According to the forum member, the flagship Ampere GPU would be the GA100 and as expected, the full configuration would feature 128 streaming multi-processor units or 8192 CUDA cores. It is not known which process node NVIDIA is using but 7nm has been highlighted in previous reports.

Utilizing the new process and GPU architecture, the chip is rumored to feature a maximum boost clock of up to 2.2 GHz on the GPU core. This is a huge bump in clock speed which if true is at least 35% faster than the GV100 GPU featured on the Quadro GV100 graphics card. The Quadro GV100 features the fastest clock for the GV100 GPU at 1627 MHz and delivers 16.6 TFLOPs of FP32 Compute performance.

Based on the number of cores and the boost clock of the GA100 GPU, we are looking at a massive 36 TFLOPs of FP32 Compute performance which is literally insane. That's more than a 2x increase in FP32 Compute and if these numbers are legit, we would be looking at an insane 18 TFLOPs of FP64 compute horsepower which is far ahead of any FP64 numbers that modern GPUs can crunch out.

The GPU is stated to feature a 300W TDP and would feature HBM2e memory and come in two flavors, a 24 GB and a 48 GB model. These memory configurations could be for the top variant only as we have also seen other variants with 32 GB HBM2e memory. NVIDIA is also rumored to double its tensor cores on the new Ampere GPUs. The current 5120 CUDA Core Volta GV100 GPU features 640 Tensor cores so based on that, an Ampere GPU with 8192 CUDA Core would feature 1024 cores for tensor operations. But since the rumor states that NVIDIA is likely to increase the tensor core count by 2x, we will be looking at 2048 tensor cores for an 8192 CUDA core chip. The specs for the rest of the variants which leaked last week are listed below:

NVIDIA RTX A6000 Flagship Ampere Workstation Graphics Card Benchmarked, 11% Faster Than Quadro RTX 6000 In SPECviewperf

NVIDIA's Next-Gen GPU #1 Specifications & Performance

This first GPU features a total SM count of 124 which equals 7936 CUDA cores since NVIDIA's professional GPU architecture comes with a 64 CUDA Core design per streaming multiprocessor. This is also a 55% increase in CUDA cores over the Tesla V100's 5120 Cores. The GPU has a maximum clock speed of 1.1 GHz and at this unfinalized clock, it should deliver around 17.5 - 18 TFLOPs of FP32 horsepower.

It carries 32 GB of HBM2e memory clocking in at 1200 MHz and runs across a 4096-bit bus interface. The reason I mention HBM2e is that it is the latest standard and NVIDIA has been known to utilize the most advanced memory standards on its HPC parts at the time of its launch.

In addition to the core and memory specifications, the GPU packs a 32 MB L2 cache which is a 5.33x increase over the Volta GV100 GPU which packs an L2 cache of just 6 MB in comparison. Given the massive amount of cache, we can expect some huge performance uplifts and a huge architectural change on NVIDIA's next-generation GPU which has been years in development.

As far as the performance is concerned, the GPU scores 222377 points in the OpenCL benchmark (CUDA) on Geekbench 5. The platform is running CUDA 8.0 and it is highly likely that the GPU was not fully optimized for it at the time of testing. With that said, the specifications of this card are looking literally insane so let's get on with the other two variants.

NVIDIA's Next-Gen GPU #2 Specifications & Performance

The second GPU features a total of 118 SMs or 7552 CUDA cores. This is a 47.5% increase in CUDA cores over the Tesla V100 with its 5120 CUDA Cores packed in 80 SMs and a total of 24 MB L2 cache. This GPU is also clocked at a maximum speed of 1.10 GHz and features 24 GB of HBM2e memory running along a 3072-bit bus at 1200 MHz clock speed. At these speeds, this chip should deliver a total theoretical compute horsepower of around 16.7 TFLOPs but once again, the clock speeds definitely don't look final and it could be higher.

This particular GPU was tested in both OpenCL and CUDA Compute benchmarks. In the OpenCL benchmark, the chip scored 184096 points while in the CUDA benchmark, it scored 169368 points. Both the 124 and 118 SM parts were running on CUDA 8.0 which once again shows that these GPUs aren't yet fully optimized for the Geekbench 5 benchmark. There's a huge difference in score for both parts despite just a 5% difference in core count.

NVIDIA's Next-Gen GPU #3 Specifications & Performance

Lastly, we have the 108 SM or 6912 CUDA core variant which has a reported clock speed of 1.01 GHz or the slowest of all three GPUs. The GPU offers a 35% increase in CUDA core count over the Tesla V100 and apparently packs 46.8 GB of HBM2e memory. This could be an error with how the Geekbench benchmark sees the total memory and it could actually be 48 GB which makes more sense. This GPU scores 141654 points in the Geekbench 5 (CUDA) benchmark which once again, is not the final score due to the lower clock speeds.

NVIDIA Tesla Graphics Cards Comparison

Tesla Graphics Card NameNVIDIA Tesla M2090NVIDIA Tesla K40NVIDIA Telsa K80NVIDIA Tesla P100NVIDIA Tesla V100NVIDIA Tesla Next-Gen #1NVIDIA Tesla Next-Gen #2NVIDIA Tesla Next-Gen #3
GPU ArchitectureFermiKeplerMaxwellPascalVoltaAmpere?Ampere?Ampere?
GPU Process40nm28nm28nm16nm12nm7nm?7nm?7nm?
GPU NameGF110GK110GK210 x 2GP100GV100GA100?GA100?GA100?
Die Size520mm2561mm2561mm2610mm2815mm2TBDTBDTBD
Transistor Count3.00 Billion7.08 Billion7.08 Billion15 Billion21.1 BillionTBDTBDTBD
CUDA Cores512 CCs (16 CUs)2880 CCs (15 CUs)2496 CCs (13 CUs) x 23840 CCs5120 CCs6912 CCs7552 CCs7936 CCs
Core ClockUp To 650 MHzUp To 875 MHzUp To 875 MHzUp To 1480 MHzUp To 1455 MHz1.08 GHz (Preliminary)1.11 GHz (Preliminary)1.11 GHz (Preliminary)
FP32 Compute1.33 TFLOPs4.29 TFLOPs8.74 TFLOPs10.6 TFLOPs15.0 TFLOPs~15 TFLOPs (Preliminary)~17 TFLOPs (Preliminary)~18 TFLOPs (Preliminary)
FP64 Compute0.66 TFLOPs1.43 TFLOPs2.91 TFLOPs5.30 TFLOPs7.50 TFLOPsTBDTBDTBD
VRAM Size6 GB12 GB12 GB x 216 GB16 GB48 GB24 GB32 GB
VRAM Bus384-bit384-bit384-bit x 24096-bit4096-bit4096-bit?3072-bit?4096-bit?
VRAM Speed3.7 GHz6 GHz5 GHz737 MHz878 MHz1200 MHz1200 MHz1200 MHz
Memory Bandwidth177.6 GB/s288 GB/s240 GB/s720 GB/s900 GB/s1.2 TB/s?1.2 TB/s?1.2 TB/s?
Maximum TDP250W300W235W300W300WTBDTBDTBD

Yesterday, AMD announced that they will be splitting its GPUs into separate Gaming and Compute segments, similar to how NVIDIA has been doing since its Pascal architecture. The new CDNA GPU family is expected to launch this year and will be based on the 7nm process node, going against NVIDIA's HPC lineup. According to the Vice President of Information Technology and the Chief Information Officer at Indiana University, who will be deploying their Big Red supercomputer this summer, it was revealed that NVIDIA's next-generation GPUs offer a massive 75% performance uplift over existing Volta-based GPUs. There are also similar reports which we have heard in the past with the GPUs offering up to 50% performance increase with twice the efficiency which would be an incredible feat to pull off.

Given that NVIDIA would be on process parity with AMD with its next-generation GPU and with a brand new architecture too, we can see some real disruptive performance. These are definitely some big specifications & numbers reported in the rumor for NVIDIA's next-generation GPUs and while we would advise our readers to take them with a grain of salt, we can definitely expect a full-blown 'official' announcement of the next-gen GPUs by NVIDIA at its GTC 2020 online keynote on 22nd of March.