NVIDIA has announced its brand new H200 Hopper GPU which now comes equipped with the world's fastest HBM3e memory from Micron. In addition to the new AI platforms, NVIDIA also announced a major supercomputer win with its Grace Hopper Superchips that now power the Exaflop Jupiter supercomputer.
NVIDIA Continues To Build AI Momentum With Upgraded Hopper GPUs, Grace Hopper Superchips & Supercomputer Wins
NVIDIA's H100 GPUs are the most highly demanded AI chips in the industry so far but the green team wants to offer even more performance to its customers. Enter, HGX H200, the latest HPC & computing platform for AI which is powered by H200 Tensor Core GPUs. These GPUs feature the latest Hopper optimizations on both hardware and software & while delivering the world's fastest memory solution to date.

The NVIDIA H200 GPUs are equipped with Micron's HBM3e solution with memory capacities of up to 141 GB and up to 4.8 TB/s of bandwidth which is 2.4x more bandwidth and double the capacity versus the NVIDIA A100. This new memory solution allows NVIDIA to nearly double the AI inference performance versus its H100 GPUs in applications such as Llama 2 (70 Billion parameter LLM). The recent advancements in the TensorRT-LLM suite have also resulted in huge performance gains in a vast number of AI applications.
In terms of solutions, the NVIDIA H200 GPUs will be available in a wide range of HGX H200 servers with 4 and 8-way GPU configurations. An 8-way configuration of H200 GPUs in an HGX system will provide up to 32 PetaFLOPs of FP8 compute performance and 1.1 TB of memory capacities.
NVIDIA H200 GPU: Supercharged With HBM3e Memory, Available In Q2 2024
The GPUs will also be compatible with the existing HGX H100 systems, making it easier for customers to upgrade their platforms. NVIDIA partners such as ASUS, ASRock Rack, Dell, Eviden, GIGABYTE, Hewlett Packard Enterprise, Ingrasys, Lenovo, QCT, Wiwynn, Supermicro, and Wistron, will offer updated solutions when the H200 GPUs become available in the 2nd quarter of 2024.

NVIDIA Grace Hopper Superchips Power 1-Exaflop Jupiter Supercomputer
In addition to the H200 GPU announcement, NVIDIA has also announced a major supercomputer win powered by its Grace Hopper Superchips (GH200). The Supercomputer is known as Jupiter and will be located at the Forschungszentrum Jülich facility in Germany as a part of the EuroHPC Joint Undertaking and contracted to Eviden and ParTec. The supercomputer will be used for Material Science, Climate Research, Drug Discovery, and More. This is also the second supercomputer that NVIDIA announced in November with the previous one being the Isambard-AI, offering up to 21 Exaflops of AI performance.

In terms of configuration, the Jupiter Supercomputer is based on Eviden’s BullSequana XH3000 which makes use of a fully liquid-cooled architecture. It boasts a total of 24,000 NVIDIA GH200 Grace Hopper Superchips which are interconnected using the company's Quantum-2 Infiniband. Considering that each Grace CPU packs 288 Neoverse cores, we are looking at almost 7 Million ARM cores on the CPU side alone for Jupiter (6,912,000 to be exact).

Performance metrics include 90 Exaflops of AI training & 1 Exaflop of high-performance compute. The supercomputer is expected to be installed in 2024. Overall, these are some major updates by NVIDIA as it continues to lead the charge of the AI world with its powerful hardware and software technologies.
NVIDIA HPC / AI GPUs
| NVIDIA Tesla Graphics Card | NVIDIA B200 | NVIDIA H200 (SXM5) | NVIDIA H100 (SMX5) | NVIDIA H100 (PCIe) | NVIDIA A100 (SXM4) | NVIDIA A100 (PCIe4) | Tesla V100S (PCIe) | Tesla V100 (SXM2) | Tesla P100 (SXM2) | Tesla P100 (PCI-Express) | Tesla M40 (PCI-Express) | Tesla K40 (PCI-Express) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GPU | B200 | H200 (Hopper) | H100 (Hopper) | H100 (Hopper) | A100 (Ampere) | A100 (Ampere) | GV100 (Volta) | GV100 (Volta) | GP100 (Pascal) | GP100 (Pascal) | GM200 (Maxwell) | GK110 (Kepler) |
| Process Node | 4nm | 4nm | 4nm | 4nm | 7nm | 7nm | 12nm | 12nm | 16nm | 16nm | 28nm | 28nm |
| Transistors | 208 Billion | 80 Billion | 80 Billion | 80 Billion | 54.2 Billion | 54.2 Billion | 21.1 Billion | 21.1 Billion | 15.3 Billion | 15.3 Billion | 8 Billion | 7.1 Billion |
| GPU Die Size | TBD | 814mm2 | 814mm2 | 814mm2 | 826mm2 | 826mm2 | 815mm2 | 815mm2 | 610 mm2 | 610 mm2 | 601 mm2 | 551 mm2 |
| SMs | 160 | 132 | 132 | 114 | 108 | 108 | 80 | 80 | 56 | 56 | 24 | 15 |
| TPCs | 80 | 66 | 66 | 57 | 54 | 54 | 40 | 40 | 28 | 28 | 24 | 15 |
| L2 Cache Size | TBD | 51200 KB | 51200 KB | 51200 KB | 40960 KB | 40960 KB | 6144 KB | 6144 KB | 4096 KB | 4096 KB | 3072 KB | 1536 KB |
| FP32 CUDA Cores Per SM | TBD | 128 | 128 | 128 | 64 | 64 | 64 | 64 | 64 | 64 | 128 | 192 |
| FP64 CUDA Cores / SM | TBD | 128 | 128 | 128 | 32 | 32 | 32 | 32 | 32 | 32 | 4 | 64 |
| FP32 CUDA Cores | TBD | 16896 | 16896 | 14592 | 6912 | 6912 | 5120 | 5120 | 3584 | 3584 | 3072 | 2880 |
| FP64 CUDA Cores | TBD | 16896 | 16896 | 14592 | 3456 | 3456 | 2560 | 2560 | 1792 | 1792 | 96 | 960 |
| Tensor Cores | TBD | 528 | 528 | 456 | 432 | 432 | 640 | 640 | N/A | N/A | N/A | N/A |
| Texture Units | TBD | 528 | 528 | 456 | 432 | 432 | 320 | 320 | 224 | 224 | 192 | 240 |
| Boost Clock | TBD | ~1850 MHz | ~1850 MHz | ~1650 MHz | 1410 MHz | 1410 MHz | 1601 MHz | 1530 MHz | 1480 MHz | 1329MHz | 1114 MHz | 875 MHz |
| TOPs (DNN/AI) | 20,000 TOPs | 3958 TOPs | 3958 TOPs | 3200 TOPs | 2496 TOPs | 2496 TOPs | 130 TOPs | 125 TOPs | N/A | N/A | N/A | N/A |
| FP16 Compute | 10,000 TFLOPs | 1979 TFLOPs | 1979 TFLOPs | 1600 TFLOPs | 624 TFLOPs | 624 TFLOPs | 32.8 TFLOPs | 30.4 TFLOPs | 21.2 TFLOPs | 18.7 TFLOPs | N/A | N/A |
| FP32 Compute | 90 TFLOPs | 67 TFLOPs | 67 TFLOPs | 800 TFLOPs | 156 TFLOPs (19.5 TFLOPs standard) | 156 TFLOPs (19.5 TFLOPs standard) | 16.4 TFLOPs | 15.7 TFLOPs | 10.6 TFLOPs | 10.0 TFLOPs | 6.8 TFLOPs | 5.04 TFLOPs |
| FP64 Compute | 45 TFLOPs | 34 TFLOPs | 34 TFLOPs | 48 TFLOPs | 19.5 TFLOPs (9.7 TFLOPs standard) | 19.5 TFLOPs (9.7 TFLOPs standard) | 8.2 TFLOPs | 7.80 TFLOPs | 5.30 TFLOPs | 4.7 TFLOPs | 0.2 TFLOPs | 1.68 TFLOPs |
| Memory Interface | 8192-bit HBM4 | 5120-bit HBM3e | 5120-bit HBM3 | 5120-bit HBM2e | 6144-bit HBM2e | 6144-bit HBM2e | 4096-bit HBM2 | 4096-bit HBM2 | 4096-bit HBM2 | 4096-bit HBM2 | 384-bit GDDR5 | 384-bit GDDR5 |
| Memory Size | Up To 192 GB HBM3 @ 8.0 Gbps | Up To 141 GB HBM3e @ 6.5 Gbps | Up To 80 GB HBM3 @ 5.2 Gbps | Up To 94 GB HBM2e @ 5.1 Gbps | Up To 40 GB HBM2 @ 1.6 TB/s Up To 80 GB HBM2 @ 1.6 TB/s | Up To 40 GB HBM2 @ 1.6 TB/s Up To 80 GB HBM2 @ 2.0 TB/s | 16 GB HBM2 @ 1134 GB/s | 16 GB HBM2 @ 900 GB/s | 16 GB HBM2 @ 732 GB/s | 16 GB HBM2 @ 732 GB/s 12 GB HBM2 @ 549 GB/s | 24 GB GDDR5 @ 288 GB/s | 12 GB GDDR5 @ 288 GB/s |
| TDP | 700W | 700W | 700W | 350W | 400W | 250W | 250W | 300W | 300W | 250W | 250W | 235W |
Follow Wccftech on Google to get more of our news coverage in your feeds.




