NVIDIA Blackwell GPU Architecture Official: 208 Billion Transistors, 5x AI Performance, 192 GB HBM3e Memory, 8 TB/s Bandwidth

Mar 18, 2024 at 04:44pm EDT
NVIDIA Blackwell GPU Architecture Official: 208 Billion Transistors, 5x AI Performance, 192 GB HBM3e Memory, 8 TB/s Bandwidth 1

NVIDIA has officially unveiled its next-gen Blackwell GPU architecture which features up to a 5x performance increase versus Hopper H100 GPUs.

NVIDIA Blackwell GPUs Feature 5x Faster AI Performance Than Hopper H100, Leading The Charge of Next-Gen AI Computing

NVIDIA has gone official with the full details of its next-generation AI & Tensor Core GPU architecture codenamed Blackwell. As expected, the Blackwell GPUs are the first to feature NVIDIA's first MCM design which will incorporate two GPUs on the same die.

Related Story Scammers Now Selling Fake NVIDIA RTX Graphics Cards With Glued-Down Plastic GPUs & Scrap Memory

Diving into the details, the NVIDIA Blackwell GPU features a total of 104 Billion transistors on each compute die which is fabricated on the TSMC 4NP process node. Interestingly, both Synopsys and TSMC have utilized NVIDIA's CuLitho technology for the production of Blackwell GPUs which makes making Each chip accelerates the manufacturing of these next-gen AI accelerator chips. The B100 GPUs are equipped with a 10 TB/s high-bandwidth interface which allows super-fast chip-to-chip interconnect. These GPUs are unified as one chip on the same package, offering up to 208 Billion transistors and full GPU cache coherency.

Compared to the Hopper, the NVIDIA Blackwell GPU offers 128 Billion more transistors, 5x the AI performance which is boosted to 20 petaFlops per chip, and 4x the on-die memory. The GPU itself is coupled with 8 HBM3e stacks featuring the world's fastest memory solution, offering 8 TB/s of memory bandwidth across an 8192-bit bus interface and up to 192 GB HBM3e memory. To quickly sum up the performance figures versus Hopper, you are getting:

NVIDIA will be offering Blackwell GPUs as a full-on platform, combining two of these GPUs which is four compute dies with a singular Grace CPU (72 ARM Neoverse V2 CPU cores). The GPUs will be inter-connected to each other and the Grace CPUs using a 900 GB/s NVLINK protocol.

NVIDIA Blackwell B200 GPUs For 2024 - 192 GB HBM3e

First up, we have the NVIDIA Blackwell B200 GPU. This is the first of the two Blackwell chips that will be adopted into various designs ranging from SXM modules, PCIe AICs & Superchip platforms. The B200 GPU will be the first NVIDIA GPU to utilize a chiplet design, featuring two compute dies based on the TSMC 4nm process node.

MCM or Multi-Chip-Module has been a long coming on the NVIDIA side of things &it's finally here as the company tries to tackle challenges associated with next-gen process nodes such as yields and cost. Chiplets provide a viable alternative where NVIDIA can still achieve faster gen-over-gen performance without compromising its supply or costs and this is just a stepping stone in its chiplet journey.

The NVIDIA Blackwell B200 GPU will be a monster chip. It incorporates a total of 160 SMs for 20,480 cores. The GPU will feature the latest NVLINK interconnect technology, supporting the same 8 GPU architecture and a 400 GbE networking switch. It's also going to be very power-hungry with a 700W peak TDP though that's also the same as the H100 and H200 chips. Summing this chip up:

On the memory side, the Blackwell B200 GPU will pack up to 192 GB of HBM3e memory. This will be featured in eight stacks of 8-hi modules, each featuring 24 GB VRAM capacity across an 8192-bit wide bus interface. This will be a 2.4x increase over the H100 80 GB GPUs which allows the chip to run bigger LLMs.

The NVIDIA Blackwell B200 and its respective platforms will pave a new era of AI computing and offer brutal competition to AMD and Intel's latest chip offerings which are yet to see widespread adoption. With the unveiling of Blackwell, NVIDIA has once again cemented itself as the dominant force of the AI market.

NVIDIA HPC / AI GPUs

NVIDIA Tesla Graphics CardNVIDIA B200NVIDIA H200 (SXM5)NVIDIA H100 (SMX5)NVIDIA H100 (PCIe)NVIDIA A100 (SXM4)NVIDIA A100 (PCIe4)Tesla V100S (PCIe)Tesla V100 (SXM2)Tesla P100 (SXM2)Tesla P100
(PCI-Express)
Tesla M40
(PCI-Express)
Tesla K40
(PCI-Express)
GPUB200H200 (Hopper)H100 (Hopper)H100 (Hopper)A100 (Ampere)A100 (Ampere)GV100 (Volta)GV100 (Volta)GP100 (Pascal)GP100 (Pascal)GM200 (Maxwell)GK110 (Kepler)
Process Node4nm4nm4nm4nm7nm7nm12nm12nm16nm16nm28nm28nm
Transistors208 Billion80 Billion80 Billion80 Billion54.2 Billion54.2 Billion21.1 Billion21.1 Billion15.3 Billion15.3 Billion8 Billion7.1 Billion
GPU Die SizeTBD814mm2814mm2814mm2826mm2826mm2815mm2815mm2610 mm2610 mm2601 mm2551 mm2
SMs160132132114108108808056562415
TPCs806666575454404028282415
L2 Cache SizeTBD51200 KB51200 KB51200 KB40960 KB40960 KB6144 KB6144 KB4096 KB4096 KB3072 KB1536 KB
FP32 CUDA Cores Per SMTBD128128128646464646464128192
FP64 CUDA Cores / SMTBD128128128323232323232464
FP32 CUDA CoresTBD16896168961459269126912512051203584358430722880
FP64 CUDA CoresTBD16896168961459234563456256025601792179296960
Tensor CoresTBD528528456432432640640N/AN/AN/AN/A
Texture UnitsTBD528528456432432320320224224192240
Boost ClockTBD~1850 MHz~1850 MHz~1650 MHz1410 MHz1410 MHz1601 MHz1530 MHz1480 MHz1329MHz1114 MHz875 MHz
TOPs (DNN/AI)20,000 TOPs3958 TOPs3958 TOPs3200 TOPs2496 TOPs2496 TOPs130 TOPs125 TOPsN/AN/AN/AN/A
FP16 Compute10,000 TFLOPs1979 TFLOPs1979 TFLOPs1600 TFLOPs624 TFLOPs624 TFLOPs32.8 TFLOPs30.4 TFLOPs21.2 TFLOPs18.7 TFLOPsN/AN/A
FP32 Compute90 TFLOPs67 TFLOPs67 TFLOPs800 TFLOPs156 TFLOPs
(19.5 TFLOPs standard)
156 TFLOPs
(19.5 TFLOPs standard)
16.4 TFLOPs15.7 TFLOPs10.6 TFLOPs10.0 TFLOPs6.8 TFLOPs5.04 TFLOPs
FP64 Compute45 TFLOPs34 TFLOPs34 TFLOPs48 TFLOPs19.5 TFLOPs
(9.7 TFLOPs standard)
19.5 TFLOPs
(9.7 TFLOPs standard)
8.2 TFLOPs7.80 TFLOPs5.30 TFLOPs4.7 TFLOPs0.2 TFLOPs1.68 TFLOPs
Memory Interface8192-bit HBM45120-bit HBM3e5120-bit HBM35120-bit HBM2e6144-bit HBM2e6144-bit HBM2e4096-bit HBM24096-bit HBM24096-bit HBM24096-bit HBM2384-bit GDDR5384-bit GDDR5
Memory SizeUp To 192 GB HBM3 @ 8.0 GbpsUp To 141 GB HBM3e @ 6.5 GbpsUp To 80 GB HBM3 @ 5.2 GbpsUp To 94 GB HBM2e @ 5.1 GbpsUp To 40 GB HBM2 @ 1.6 TB/s
Up To 80 GB HBM2 @ 1.6 TB/s
Up To 40 GB HBM2 @ 1.6 TB/s
Up To 80 GB HBM2 @ 2.0 TB/s
16 GB HBM2 @ 1134 GB/s16 GB HBM2 @ 900 GB/s16 GB HBM2 @ 732 GB/s16 GB HBM2 @ 732 GB/s
12 GB HBM2 @ 549 GB/s
24 GB GDDR5 @ 288 GB/s12 GB GDDR5 @ 288 GB/s
TDP700W700W700W350W400W250W250W300W300W250W250W235W

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.