NVIDIA's next-generation Hopper GH100 GPU is going to be a monster of a chip based on its die size and transistor count.
NVIDIA Hopper GH100 GPU For Next-Gen Data Centers Rumored To Feature Over 140 Billion Transistors In A Monster 5nm Package
A few weeks ago, it was reported in a rumor that NVIDIA's Hopper GH100 flagship GPU would be based on a 5nm process node with a die size measuring close to 900mm2. This would make it the largest GPU ever produced, not only on the 5nm process node but also in all existence. But that's not all, now a new rumor has popped up over at Chiphell Forums which alleges that the GPU could feature over 140 Billion transistors.
Well, just how much are 140 Billion transistors? For comparison, the current flagship data center chips such as AMD's Aldebaran for Instinct MI200 series and NVIDIA Ampere GA100 for the A100 accelerators feature just 58.2 and 54.2 Billion transistors, respectively. That's almost a 2.5x overall transistor count bump for the Hopper GH100 GPU if the rumor holds true.
In terms of density, the NVIDIA Ampere A100 amounts to 65.6M transistors per mm2, while the Aldebaran GPU (based on its speculated die size of 790mm2) should have a density of 73.6M transistors per mm2. Assuming that the GH100 measures around 900mm2, its density should easily cross 150M transistors per mm2. That's more than twice the density increase on the 5nm process node.
But once again, these are all rumored figures and will only be applicable to the monolithic GH100 Hopper GPU. The MCM GPU is an entirely separate entity based on rumors and will come as the GH102 GPU. We don't know the exact specifications except what research papers & rumors have told us. But all in all, the NVIDIA Hopper GPU, both, in its monolithic and MCM form, will offer a serious increase in transistor count and feature advanced 5nm packaging solutions.
NVIDIA Hopper GPU - Everything We Know So Far
From previous information, we know that NVIDIA's GH100 accelerator would be based on TSMC's 5nm process node. Hopper is supposed to have two next-gen GPU modules so we are looking at 288 SM units in total.
We can't give a rundown on the core count yet since we don't know the number of cores featured in each SMs but if it's going to stick to 64 cores per SM, then we get 18,432 cores which are 2.25x more than the full GA100 GPU configuration. NVIDIA could also leverage more FP64, FP16 & Tensor cores within its Hopper GPU which would drive up performance immensely. And that's going to be a necessity to rival Intel's Ponte Vecchio which is expected to feature 1:1 FP64.
It is likely that the final configuration will come with 134 of the 144 SM units enabled on each GPU module and as such, we are likely looking at a single GH100 die in action. But it is unlikely that NVIDIA would reach the same FP32 or FP64 Flops as MI200's without using GPU Sparsity.
But NVIDIA may likely have a secret weapon in their sleeves and that would be the COPA-based GPU implementation of Hopper. NVIDIA talks about two Domain-Specialized COPA-GPUs based on next-generation architecture, one for HPC and one for DL segment. The HPC variant features a very standard approach which consists of an MCM GPU design and the respective HBM/MC+HBM (IO) chiplets but the DL variant is where things start to get interesting. The DL variant houses a huge cache on an entirely separate die that is interconnected with the GPU modules.
|Architecture||LLC Capacity||DRAM BW||DRAM Capacity|
Various variants have been outlined with up to 960 / 1920 MB of LLC (Last-Level-Cache), HBM2e DRAM capacities of up to 233 GB, and bandwidth of up to 6.3 TB/s. These are all theoretical but given that NVIDIA has discussed them now, we may likely see a Hopper variant with such a design during the full unveil at GTC 2022.
NVIDIA Hopper GH100 'Official Specs':
|NVIDIA Tesla Graphics Card||Tesla K40|
|Tesla P100 (SXM2)||Tesla V100 (SXM2)||NVIDIA A100 (SXM4)||NVIDIA H100 (PCIe)||NVIDIA H100 (SMX5)|
|GPU||GK110 (Kepler)||GM200 (Maxwell)||GP100 (Pascal)||GP100 (Pascal)||GV100 (Volta)||GA100 (Ampere)||GH100 (Hopper)||GH100 (Hopper)|
|Transistors||7.1 Billion||8 Billion||15.3 Billion||15.3 Billion||21.1 Billion||54.2 Billion||80 Billion||80 Billion|
|GPU Die Size||551 mm2||601 mm2||610 mm2||610 mm2||815mm2||826mm2||814mm2||814mm2|
|FP32 CUDA Cores Per SM||192||128||64||64||64||64||128||128|
|FP64 CUDA Cores / SM||64||4||32||32||32||32||128||128|
|FP32 CUDA Cores||2880||3072||3584||3584||5120||6912||14592||16896|
|FP64 CUDA Cores||960||96||1792||1792||2560||3456||14592||16896|
|Boost Clock||875 MHz||1114 MHz||1329MHz||1480 MHz||1530 MHz||1410 MHz||TBD||TBD|
|TOPs (DNN/AI)||N/A||N/A||N/A||N/A||125 TOPs||1248 TOPs|
2496 TOPs with Sparsity
|FP16 Compute||N/A||N/A||18.7 TFLOPs||21.2 TFLOPs||30.4 TFLOPs||312 TFLOPs|
624 TFLOPs with Sparsity
|1600 TFLOPs||2000 TFLOPs|
|FP32 Compute||5.04 TFLOPs||6.8 TFLOPs||10.0 TFLOPs||10.6 TFLOPs||15.7 TFLOPs||19.4 TFLOPs|
156 TFLOPs With Sparsity
|800 TFLOPs||1000 TFLOPs|
|FP64 Compute||1.68 TFLOPs||0.2 TFLOPs||4.7 TFLOPs||5.30 TFLOPs||7.80 TFLOPs||19.5 TFLOPs|
(9.7 TFLOPs standard)
|48 TFLOPs||60 TFLOPs|
|Memory Interface||384-bit GDDR5||384-bit GDDR5||4096-bit HBM2||4096-bit HBM2||4096-bit HBM2||6144-bit HBM2e||5120-bit HBM2e||5120-bit HBM3|
|Memory Size||12 GB GDDR5 @ 288 GB/s||24 GB GDDR5 @ 288 GB/s||16 GB HBM2 @ 732 GB/s|
12 GB HBM2 @ 549 GB/s
|16 GB HBM2 @ 732 GB/s||16 GB HBM2 @ 900 GB/s||Up To 40 GB HBM2 @ 1.6 TB/s|
Up To 80 GB HBM2 @ 1.6 TB/s
|Up To 80 GB HBM2e @ 2.0 Gbps||Up To 80 GB HBM3 @ 3.0 Gbps|
|L2 Cache Size||1536 KB||3072 KB||4096 KB||4096 KB||6144 KB||40960 KB||51200 KB||51200 KB|
News Source: HXL (@9550pro)