NVIDIA Blackwell Ultra “GB300” GPU, The Fastest AI Chip, Detailed: Dual Reticle GPU With Over 20K Cores, 288 GB HBM3e Memory at 8 TB/s & 50% Faster Than GB200

Aug 25, 2025 at 01:55am EDT
NVIDIA Readies "Scaled-Down" Blackwell B200A AI Accelerator, Targeting The Wider Enterprise & AI Market 1

NVIDIA has provided an in-depth breakdown of its fastest chip for AI, the Blackwell Ultra GB300, which is 50% faster than GB200 & packs 288 GB memory.

NVIDIA's Blackwell Ultra "GB300" Is The Miracle Chip For AI, 50% Faster Than GB200 And Packs 288 GB of Memory

A few days ago, NVIDIA rolled out an article giving a breakdown of its latest and greatest AI chip, the GB300 Blackwell Ultra. This chip is now in full production and has already been rolled out to key customers. While the chip is an extension of the Blackwell solution, it does offer a significant upgrade in terms of performance and features.

Related Story NVIDIA Confirms Vera Rubin Launch In Q3 With Volume Ramp by Q4, As Blackwell Continues To See Massive Demand

Just like how the NVIDIA Super series is a better version of the original RTX gaming cards, the Ultra series is an enhanced version of the AI chips that were initially introduced. NVIDIA didn't have Ultra offerings in the previous lineups, such as Hopper and Volta, but those also technically had Ultra or enhanced versions. Plus, even though Ultra chips are better on a hardware level, software updates and optimizations also deliver some substantial gains on Non-Ultra or non-enhanced chips.

So, what is Blackwell Ultra GB300? Well, as said above, it is an enhanced version which makes use of two Reticle-sized Dies and connects them with NVIDIA's NV-HBI high-bandwidth interface to show us as a single GPU. The GPU is quite dense, based on the TSMC 4NP (optimized 5nm for NVIDIA) node, and houses a total of 208 billion transistors. The NV-HBI interface provides a 10 TB/s bandwidth for the two GPU dies, all while functioning as a single chip.

The NVIDIA Blackwell Ultra GB300 GPU packs a total of 160 SMs, each with a total of 128 CUDA cores, four 5th Gen Tensor cores with FP8, FP6, NVFP4 precision compute, 256 KB of Tensor memory or TMEM, and SFUs. This rounds up to a total of 20,480 CUDA cores and 640 Tensor cores, plus 40 MB of TMEM.

FeatureHopperBlackwellBlackwell Ultra
Manufacturing processTSMC 4NTSMC 4NPTSMC 4NP
Transistors80B208B208B
Dies per GPU122
NVFP4 dense | sparse performance10 | 20 PetaFLOPS15  | 20 PetaFLOPS
FP8 dense | sparse performance2 | 4 PetaFLOPS5 | 10 PetaFLOPS5 | 10 PetaFLOPS
Attention acceleration
(SFU EX2)
4.5 TeraExponentials/s5 TeraExponentials/s10.7 TeraExponentials/s
Max HBM capacity80 GB HBM (H100) 
141 GB HBM3E (H200)
192 GB HBM3E288 GB HBM3E
Max HBM bandwidth3.35 TB/s (H100)
4.8 TB/s (H200)
8 TB/s8 TB/s
NVLink bandwidth900 GB/s1,800 GB/s1,800 GB/s
Max power (TGP)Up to 700WUp to 1,200WUp to 1,400W

The 5th Gen Tensor Cores are where all the magic happens, as they are responsible for all the AI compute operations. NVIDIA has delivered major innovations in each generation of Tensor Cores for its GPUs, such as:

Blackwell Ultra also brings a huge upgrade to memory, offering 288 GB of HBM3e capacities versus a max of 192 GB on the previous Blackwell GB200 solutions. This upgrade is what will lead NVIDIA to support multi-trillion-parameter AI models. The memory comes in 8 stacks with a 16 512-bit controller (8192-bit wide interface) and operates at 8 TB/s per GPU. The memory enables:

The interconnect on Blackwell is the same NVLINK provided by the NVLINK Switch, NVLINK-C2C, and there's also the use of PCIe Gen6 x16 interface for connection to host GPUs. Following are the NVLINK 5 and Host side connectivity features/specs:

InterconnectHopper GPUBlackwell GPUBlackwell Ultra GPU
NVLink (GPU-GPU)9001,8001,800
NVLink-C2C (CPU-GPU)900900900
PCIe Interface128 (Gen 5)256 (Gen 6)256 (Gen 6)

The result is that NVIDIA's Blackwell Ultra GB300 platform is able to achieve a 50% increase in Dense Low Precision Compute output using the new NVFP4 standard. The new model delivers near FP8 accuracy, & the differences are often less than 1%. This also reduces the memory footprint by 1.8x versus FP8 and 3.5x versus FP16.

Blackwell Ultra also sees advanced scheduling management and new Enterprise-grade security features, such as:

Performance efficiency is another area where Blackwell Ultra GB300 takes charge, offering higher TPS/MW than Blackwell GB200, as shown in the chart below:

All this shows that NVIDIA is simply at the top of the AI ladder with engineering marvels such as Blackwell and Blackwell Ultra. Their in-depth software support and optimizations are what's been really ticking the boxes for them, and the annual hardware cadence plus increased R&D is definitely going to keep them going at it for several years.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.