⋮  

NVIDIA GH100 Hopper Flagship GPU To Measure About 1000mm2 Making It The Largest GPU Ever Made

Submit

NVIDIA might be having some trouble filing the trademark for its next-gen Hopper GPUs but that doesn't hinder its development of the flagship GH100 die as the latest rumor from Kopite7kimi claims that the chip would measure around 1000mm2.

NVIDIA GH100 GPU, The Next-Gen Flagship Data Center Chip, To Measure Around 1000mm2

Currently, the biggest GPU under production is the NVIDIA Ampere GA100 which measures 826 mm2. If the rumor is correct, then NVIDIA's Hopper GH100 will go on to become the largest GPU design ever conceived, measuring around 1000mm2, easily topping the current monster GPUs by at least 100mm2.

NVIDIA GeForce RTX 4080 Graphics Card Specs, Performance, Price & Availability – Everything We Know So Far

But that's not all, the die size in question is for a singular GH100 GPU die and we have heard rumors that Hopper will be NVIDIA's first MCM chip design so considering that we get at least two Hopper GH100 GPUs on the same interposer, the dies alone would measure 2000mm2. All of this means that the interposer would be vastly bigger than what we have seen yet, considering it will pack several HBM2e stacks and other connectivity on board. However, Greymon55 has stated that it is not the GH100 that will feature an MCM but another chip, the GH102. So this means that GH100 is likely to remain a monolithic design but Hopper will feature MCM parts.

NVIDIA Hopper GPU - Everything We Know So Far

From previous information, we know that NVIDIA's GH100 accelerator would be based on TSMC's 5nm process node. Hopper is supposed to have two next-gen GPU modules so we are looking at 288 SM units in total.

NVIDIA GeForce RTX 4090 Ti & RTX 4090 Graphics Card Specs, Performance, Price & Availability – Everything We Know So Far

We can't give a rundown on the core count yet since we don't know the number of cores featured in each SMs but if it's going to stick to 64 cores per SM, then we get 18,432 cores which are 2.25x more than the full GA100 GPU configuration. NVIDIA could also leverage more FP64, FP16 & Tensor cores within its Hopper GPU which would drive up performance immensely. And that's going to be a necessity to rival Intel's Ponte Vecchio which is expected to feature 1:1 FP64.

It is likely that the final configuration will come with 134 of the 144 SM units enabled on each GPU module and as such, we are likely looking at a single GH100 die in action. But it is unlikely that NVIDIA would reach the same FP32 or FP64 Flops as MI200's without using GPU Sparsity.

But NVIDIA may likely have a secret weapon in their sleeves and that would be the COPA-based GPU implementation of Hopper. NVIDIA talks about two Domain-Specialized COPA-GPUs based on next-generation architecture, one for HPC and one for DL segment. The HPC variant features a very standard approach which consists of an MCM GPU design and the respective HBM/MC+HBM (IO) chiplets but the DL variant is where things start to get interesting.  The DL variant houses a huge cache on an entirely separate die that is interconnected with the GPU modules.

Architecture LLC Capacity DRAM BW DRAM Capacity
Configuration (MB) (TB/s) (GB)
GPU-N 60 2.7 100
COPA-GPU-1 960 2.7 100
COPA-GPU-2 960 4.5 167
COPA-GPU-3 1,920 2.7 100
COPA-GPU-4 1,920 4.5 167
COPA-GPU-5 1,920 6.3 233
Perfect L2 infinite infinite infinite

Various variants have been outlined with up to 960 / 1920 MB of LLC (Last-Level-Cache), HBM2e DRAM capacities of up to 233 GB, and bandwidth of up to 6.3 TB/s. These are all theoretical but given that NVIDIA has discussed them now, we may likely see a Hopper variant with such a design during the full unveil at GTC 2022.

NVIDIA Hopper GH100 'Official Specs':

NVIDIA Tesla Graphics CardTesla K40
(PCI-Express)
Tesla M40
(PCI-Express)
Tesla P100
(PCI-Express)
Tesla P100 (SXM2)Tesla V100 (SXM2)NVIDIA A100 (SXM4)NVIDIA H100 (PCIe)NVIDIA H100 (SMX5)
GPUGK110 (Kepler)GM200 (Maxwell)GP100 (Pascal)GP100 (Pascal)GV100 (Volta)GA100 (Ampere)GH100 (Hopper)GH100 (Hopper)
Process Node28nm28nm16nm16nm12nm7nm4nm4nm
Transistors7.1 Billion8 Billion15.3 Billion15.3 Billion21.1 Billion54.2 Billion80 Billion80 Billion
GPU Die Size551 mm2601 mm2610 mm2610 mm2815mm2826mm2814mm2814mm2
SMs1524565680108114132
TPCs1524282840545766
FP32 CUDA Cores Per SM19212864646464128128
FP64 CUDA Cores / SM64432323232128128
FP32 CUDA Cores2880307235843584512069121459216896
FP64 CUDA Cores9609617921792256034561459216896
Tensor CoresN/AN/AN/AN/A640432456528
Texture Units240192224224320432456528
Boost Clock875 MHz1114 MHz1329MHz1480 MHz1530 MHz1410 MHzTBDTBD
TOPs (DNN/AI)N/AN/AN/AN/A125 TOPs1248 TOPs
2496 TOPs with Sparsity
1600 TOPs
3200 TOPs
2000 TOPs
4000 TOPs
FP16 ComputeN/AN/A18.7 TFLOPs21.2 TFLOPs30.4 TFLOPs312 TFLOPs
624 TFLOPs with Sparsity
1600 TFLOPs2000 TFLOPs
FP32 Compute5.04 TFLOPs6.8 TFLOPs10.0 TFLOPs10.6 TFLOPs15.7 TFLOPs19.4 TFLOPs
156 TFLOPs With Sparsity
800 TFLOPs1000 TFLOPs
FP64 Compute1.68 TFLOPs0.2 TFLOPs4.7 TFLOPs5.30 TFLOPs7.80 TFLOPs19.5 TFLOPs
(9.7 TFLOPs standard)
48 TFLOPs60 TFLOPs
Memory Interface384-bit GDDR5384-bit GDDR54096-bit HBM24096-bit HBM24096-bit HBM26144-bit HBM2e5120-bit HBM2e5120-bit HBM3
Memory Size12 GB GDDR5 @ 288 GB/s24 GB GDDR5 @ 288 GB/s16 GB HBM2 @ 732 GB/s
12 GB HBM2 @ 549 GB/s
16 GB HBM2 @ 732 GB/s16 GB HBM2 @ 900 GB/sUp To 40 GB HBM2 @ 1.6 TB/s
Up To 80 GB HBM2 @ 1.6 TB/s
Up To 80 GB HBM2e @ 2.0 GbpsUp To 80 GB HBM3 @ 3.0 Gbps
L2 Cache Size1536 KB3072 KB4096 KB4096 KB6144 KB40960 KB51200 KB51200 KB
TDP235W250W250W300W300W400W350W700W
Submit