As NVIDIA's GTC 2020 closes in, new specifications of the Ampere GA100 GPU have been leaked which once again shows that the next-gen GPU architecture from the green team is going to be an absolute beast of a Compute powerhouse.
NVIDIA Ampere GPU Specifications Rumored To Include 8192 CUDA Cores, Up To 48 GB HBM2e Memory & Core Clocks Beyond 2 GHz On The Flagship GA100 Chip
The latest specifications come from the Stage1 Chinese forums where a user who's know to post leaks before has listed down key details for the flagship Ampere GPU, the GA100. NVIDIA's Ampere GPU family has been known for a while now but it is something that NVIDIA has yet to reveal to the public. There are several GPUs of the Ampere family that have appeared in various leaks such as the GA100 itself but there hasn't been any conclusive evidence if Ampere is the name of the family of GPUs which NVIDIA is going to introduce next for the HPC / Data Center segment.
According to the forum member, the flagship Ampere GPU would be the GA100 and as expected, the full configuration would feature 128 streaming multi-processor units or 8192 CUDA cores. It is not known which process node NVIDIA is using but 7nm has been highlighted in previous reports.
Utilizing the new process and GPU architecture, the chip is rumored to feature a maximum boost clock of up to 2.2 GHz on the GPU core. This is a huge bump in clock speed which if true is at least 35% faster than the GV100 GPU featured on the Quadro GV100 graphics card. The Quadro GV100 features the fastest clock for the GV100 GPU at 1627 MHz and delivers 16.6 TFLOPs of FP32 Compute performance.
Based on the number of cores and the boost clock of the GA100 GPU, we are looking at a massive 36 TFLOPs of FP32 Compute performance which is literally insane. That's more than a 2x increase in FP32 Compute and if these numbers are legit, we would be looking at an insane 18 TFLOPs of FP64 compute horsepower which is far ahead of any FP64 numbers that modern GPUs can crunch out.
The GPU is stated to feature a 300W TDP and would feature HBM2e memory and come in two flavors, a 24 GB and a 48 GB model. These memory configurations could be for the top variant only as we have also seen other variants with 32 GB HBM2e memory. NVIDIA is also rumored to double its tensor cores on the new Ampere GPUs. The current 5120 CUDA Core Volta GV100 GPU features 640 Tensor cores so based on that, an Ampere GPU with 8192 CUDA Core would feature 1024 cores for tensor operations. But since the rumor states that NVIDIA is likely to increase the tensor core count by 2x, we will be looking at 2048 tensor cores for an 8192 CUDA core chip. The specs for the rest of the variants which leaked last week are listed below:
NVIDIA's Next-Gen GPU #1 Specifications & Performance
This first GPU features a total SM count of 124 which equals 7936 CUDA cores since NVIDIA's professional GPU architecture comes with a 64 CUDA Core design per streaming multiprocessor. This is also a 55% increase in CUDA cores over the Tesla V100's 5120 Cores. The GPU has a maximum clock speed of 1.1 GHz and at this unfinalized clock, it should deliver around 17.5 - 18 TFLOPs of FP32 horsepower.
It carries 32 GB of HBM2e memory clocking in at 1200 MHz and runs across a 4096-bit bus interface. The reason I mention HBM2e is that it is the latest standard and NVIDIA has been known to utilize the most advanced memory standards on its HPC parts at the time of its launch.
In addition to the core and memory specifications, the GPU packs a 32 MB L2 cache which is a 5.33x increase over the Volta GV100 GPU which packs an L2 cache of just 6 MB in comparison. Given the massive amount of cache, we can expect some huge performance uplifts and a huge architectural change on NVIDIA's next-generation GPU which has been years in development.
As far as the performance is concerned, the GPU scores 222377 points in the OpenCL benchmark (CUDA) on Geekbench 5. The platform is running CUDA 8.0 and it is highly likely that the GPU was not fully optimized for it at the time of testing. With that said, the specifications of this card are looking literally insane so let's get on with the other two variants.
NVIDIA's Next-Gen GPU #2 Specifications & Performance
The second GPU features a total of 118 SMs or 7552 CUDA cores. This is a 47.5% increase in CUDA cores over the Tesla V100 with its 5120 CUDA Cores packed in 80 SMs and a total of 24 MB L2 cache. This GPU is also clocked at a maximum speed of 1.10 GHz and features 24 GB of HBM2e memory running along a 3072-bit bus at 1200 MHz clock speed. At these speeds, this chip should deliver a total theoretical compute horsepower of around 16.7 TFLOPs but once again, the clock speeds definitely don't look final and it could be higher.
This particular GPU was tested in both OpenCL and CUDA Compute benchmarks. In the OpenCL benchmark, the chip scored 184096 points while in the CUDA benchmark, it scored 169368 points. Both the 124 and 118 SM parts were running on CUDA 8.0 which once again shows that these GPUs aren't yet fully optimized for the Geekbench 5 benchmark. There's a huge difference in score for both parts despite just a 5% difference in core count.
NVIDIA's Next-Gen GPU #3 Specifications & Performance
Lastly, we have the 108 SM or 6912 CUDA core variant which has a reported clock speed of 1.01 GHz or the slowest of all three GPUs. The GPU offers a 35% increase in CUDA core count over the Tesla V100 and apparently packs 46.8 GB of HBM2e memory. This could be an error with how the Geekbench benchmark sees the total memory and it could actually be 48 GB which makes more sense. This GPU scores 141654 points in the Geekbench 5 (CUDA) benchmark which once again, is not the final score due to the lower clock speeds.
NVIDIA Tesla Graphics Cards Comparison
|Tesla Graphics Card Name||NVIDIA Tesla M2090||NVIDIA Tesla K40||NVIDIA Telsa K80||NVIDIA Tesla P100||NVIDIA Tesla V100||NVIDIA Tesla Next-Gen #1||NVIDIA Tesla Next-Gen #2||NVIDIA Tesla Next-Gen #3|
|GPU Name||GF110||GK110||GK210 x 2||GP100||GV100||GA100?||GA100?||GA100?|
|Transistor Count||3.00 Billion||7.08 Billion||7.08 Billion||15 Billion||21.1 Billion||TBD||TBD||TBD|
|CUDA Cores||512 CCs (16 CUs)||2880 CCs (15 CUs)||2496 CCs (13 CUs) x 2||3840 CCs||5120 CCs||6912 CCs||7552 CCs||7936 CCs|
|Core Clock||Up To 650 MHz||Up To 875 MHz||Up To 875 MHz||Up To 1480 MHz||Up To 1455 MHz||1.08 GHz (Preliminary)||1.11 GHz (Preliminary)||1.11 GHz (Preliminary)|
|FP32 Compute||1.33 TFLOPs||4.29 TFLOPs||8.74 TFLOPs||10.6 TFLOPs||15.0 TFLOPs||~15 TFLOPs (Preliminary)||~17 TFLOPs (Preliminary)||~18 TFLOPs (Preliminary)|
|FP64 Compute||0.66 TFLOPs||1.43 TFLOPs||2.91 TFLOPs||5.30 TFLOPs||7.50 TFLOPs||TBD||TBD||TBD|
|VRAM Size||6 GB||12 GB||12 GB x 2||16 GB||16 GB||48 GB||24 GB||32 GB|
|VRAM Bus||384-bit||384-bit||384-bit x 2||4096-bit||4096-bit||4096-bit?||3072-bit?||4096-bit?|
|VRAM Speed||3.7 GHz||6 GHz||5 GHz||737 MHz||878 MHz||1200 MHz||1200 MHz||1200 MHz|
|Memory Bandwidth||177.6 GB/s||288 GB/s||240 GB/s||720 GB/s||900 GB/s||1.2 TB/s?||1.2 TB/s?||1.2 TB/s?|
Yesterday, AMD announced that they will be splitting its GPUs into separate Gaming and Compute segments, similar to how NVIDIA has been doing since its Pascal architecture. The new CDNA GPU family is expected to launch this year and will be based on the 7nm process node, going against NVIDIA's HPC lineup. According to the Vice President of Information Technology and the Chief Information Officer at Indiana University, who will be deploying their Big Red supercomputer this summer, it was revealed that NVIDIA's next-generation GPUs offer a massive 75% performance uplift over existing Volta-based GPUs. There are also similar reports which we have heard in the past with the GPUs offering up to 50% performance increase with twice the efficiency which would be an incredible feat to pull off.
Given that NVIDIA would be on process parity with AMD with its next-generation GPU and with a brand new architecture too, we can see some real disruptive performance. These are definitely some big specifications & numbers reported in the rumor for NVIDIA's next-generation GPUs and while we would advise our readers to take them with a grain of salt, we can definitely expect a full-blown 'official' announcement of the next-gen GPUs by NVIDIA at its GTC 2020 online keynote on 22nd of March.