NVIDIA Blackwell Ultra “GB300” GPU, The Fastest AI Chip, Detailed: Dual Reticle GPU With Over 20K Cores, 288 GB HBM3e Memory at 8 TB/s & 50% Faster Than GB200

• Aug 25, 2025 at 01:55am EDT

NVIDIA Readies "Scaled-Down" Blackwell B200A AI Accelerator, Targeting The Wider Enterprise & AI Market 1

NVIDIA has provided an in-depth breakdown of its fastest chip for AI, the Blackwell Ultra GB300, which is 50% faster than GB200 & packs 288 GB memory.

NVIDIA's Blackwell Ultra "GB300" Is The Miracle Chip For AI, 50% Faster Than GB200 And Packs 288 GB of Memory

A few days ago, NVIDIA rolled out an article giving a breakdown of its latest and greatest AI chip, the GB300 Blackwell Ultra. This chip is now in full production and has already been rolled out to key customers. While the chip is an extension of the Blackwell solution, it does offer a significant upgrade in terms of performance and features.

Just like how the NVIDIA Super series is a better version of the original RTX gaming cards, the Ultra series is an enhanced version of the AI chips that were initially introduced. NVIDIA didn't have Ultra offerings in the previous lineups, such as Hopper and Volta, but those also technically had Ultra or enhanced versions. Plus, even though Ultra chips are better on a hardware level, software updates and optimizations also deliver some substantial gains on Non-Ultra or non-enhanced chips.

So, what is Blackwell Ultra GB300? Well, as said above, it is an enhanced version which makes use of two Reticle-sized Dies and connects them with NVIDIA's NV-HBI high-bandwidth interface to show us as a single GPU. The GPU is quite dense, based on the TSMC 4NP (optimized 5nm for NVIDIA) node, and houses a total of 208 billion transistors. The NV-HBI interface provides a 10 TB/s bandwidth for the two GPU dies, all while functioning as a single chip.

The NVIDIA Blackwell Ultra GB300 GPU packs a total of 160 SMs, each with a total of 128 CUDA cores, four 5th Gen Tensor cores with FP8, FP6, NVFP4 precision compute, 256 KB of Tensor memory or TMEM, and SFUs. This rounds up to a total of 20,480 CUDA cores and 640 Tensor cores, plus 40 MB of TMEM.

Feature	Hopper	Blackwell	Blackwell Ultra
Manufacturing process	TSMC 4N	TSMC 4NP	TSMC 4NP
Transistors	80B	208B	208B
Dies per GPU	1	2	2
NVFP4 dense \| sparse performance	–	10 \| 20 PetaFLOPS	15 \| 20 PetaFLOPS
FP8 dense \| sparse performance	2 \| 4 PetaFLOPS	5 \| 10 PetaFLOPS	5 \| 10 PetaFLOPS
Attention acceleration (SFU EX2)	4.5 TeraExponentials/s	5 TeraExponentials/s	10.7 TeraExponentials/s
Max HBM capacity	80 GB HBM (H100) 141 GB HBM3E (H200)	192 GB HBM3E	288 GB HBM3E
Max HBM bandwidth	3.35 TB/s (H100) 4.8 TB/s (H200)	8 TB/s	8 TB/s
NVLink bandwidth	900 GB/s	1,800 GB/s	1,800 GB/s
Max power (TGP)	Up to 700W	Up to 1,200W	Up to 1,400W

The 5th Gen Tensor Cores are where all the magic happens, as they are responsible for all the AI compute operations. NVIDIA has delivered major innovations in each generation of Tensor Cores for its GPUs, such as:

NVIDIA Volta: 8-thread MMA units, FP16 with FP32 accumulation for training.
NVIDIA Ampere: Full warp-wide MMA, BF16, and TensorFloat-32 formats.
NVIDIA Hopper: Warp-group MMA across 128 threads, Transformer Engine with FP8 support.
NVIDIA Blackwell: 2nd Gen Transformer Engine with FP8, FP6, NVFP4 compute, TMEM Memory

Blackwell Ultra also brings a huge upgrade to memory, offering 288 GB of HBM3e capacities versus a max of 192 GB on the previous Blackwell GB200 solutions. This upgrade is what will lead NVIDIA to support multi-trillion-parameter AI models. The memory comes in 8 stacks with a 16 512-bit controller (8192-bit wide interface) and operates at 8 TB/s per GPU. The memory enables:

Complete model residence: 300B+ parameter models without memory offloading.
Extended context lengths: Larger KV cache capacity for transformer models.
Improved compute efficiency: Higher compute-to-memory ratios for diverse workloads.

The interconnect on Blackwell is the same NVLINK provided by the NVLINK Switch, NVLINK-C2C, and there's also the use of PCIe Gen6 x16 interface for connection to host GPUs. Following are the NVLINK 5 and Host side connectivity features/specs:

Per-GPU Bandwidth: 1.8 TB/s bidirectional (18 links x 100 GB/s)
Performance Scaling: 2x improvement over NVLink 4 (Hopper GPU)
Maximum Topology: 576 GPUs in non-blocking compute fabric
Rack-Scale Integration: 72-GPU NVL72 configurations with 130 TB/s aggregate bandwidth

PCIe Interface: Gen6 × 16 lanes (256 GB/s bidirectional)
NVLink-C2C: Grace CPU-GPU communication with memory coherency (900 GB/s

Interconnect	Hopper GPU	Blackwell GPU	Blackwell Ultra GPU
NVLink (GPU-GPU)	900	1,800	1,800
NVLink-C2C (CPU-GPU)	900	900	900
PCIe Interface	128 (Gen 5)	256 (Gen 6)	256 (Gen 6)

The result is that NVIDIA's Blackwell Ultra GB300 platform is able to achieve a 50% increase in Dense Low Precision Compute output using the new NVFP4 standard. The new model delivers near FP8 accuracy, & the differences are often less than 1%. This also reduces the memory footprint by 1.8x versus FP8 and 3.5x versus FP16.

Blackwell Ultra also sees advanced scheduling management and new Enterprise-grade security features, such as:

Enhanced GigaThread Engine: Next-generation work scheduler providing improved context switching performance and optimized workload distribution across all 160 SMs.
Multi-Instance GPU (MIG): Blackwell Ultra GPUs can be partitioned into different-sized MIG instances. For example, an administrator can create two instances with 140 GB of memory each, four instances with 70 GB each, or seven instances with 34 GB each, enabling secure multi-tenancy with predictable performance isolation.
Confidential computing and secure AI: Secure and performant protection for sensitive AI models and data, extending hardware-based Trusted Execution Environment (TEE) to GPUs with industry-first TEE-I/O capabilities in the Blackwell architecture and inline NVLink protection for near-identical throughput when compared to unencrypted modes.
Advanced NVIDIA Remote Attestation Service (RAS) engine: AI-powered reliability system monitoring thousands of parameters to predict failures, optimize maintenance schedules, and maximize system uptime in large-scale deployments.

Performance efficiency is another area where Blackwell Ultra GB300 takes charge, offering higher TPS/MW than Blackwell GB200, as shown in the chart below:

Graph of architecture impact on inference performance and Pareto frontier user experience simulation.

Chart on AI architecture's impact on inference performance and user experience at Pareto Frontier.

All this shows that NVIDIA is simply at the top of the AI ladder with engineering marvels such as Blackwell and Blackwell Ultra. Their in-depth software support and optimizations are what's been really ticking the boxes for them, and the annual hardware cadence plus increased R&D is definitely going to keep them going at it for several years.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Deal of the Day

Read all comments on NVIDIA Blackwell Ultra “GB300” GPU, The Fastest AI Chip, Detailed: Dual Reticle GPU With Over 20K Cores, 288 GB HBM3e Memory at 8 TB/s & 50% Faster Than GB200

NVIDIA Blackwell Ultra “GB300” GPU, The Fastest AI Chip, Detailed: Dual Reticle GPU With Over 20K Cores, 288 GB HBM3e Memory at 8 TB/s & 50% Faster Than GB200

NVIDIA's Blackwell Ultra "GB300" Is The Miracle Chip For AI, 50% Faster Than GB200 And Packs 288 GB of Memory

Deal of the Day

Trending Stories

Valve Readies SteamOS For Steam Machine Hardware, But Its Silence On $950+ Price Grows More Deafening By The Week

Xbox Plans to Shutter Compulsion Games, Arkane Lyon Also “Scared” of Being Shut Down, it’s Claimed

Lara Croft’s PlayStation 1 Debut Returns as a Modern Remake, but the Demo Proved Exploration Still Beats the Gunplay

Datacenters Are Outstripping the Power Grid, Forcing NVIDIA and Google Into a Radical 800V DC Overhaul by Q3 2026

Tensordyne’s 3nm Napier AI Chip Promises 13x Higher Token Throughput Than Blackwell & Blazes Past Rubin With 1000 Tokens/s In Multi-Trillion Parameter Models

Popular Discussions

AMD Says EPYC Turin Already Crushes NVIDIA Vera by 2.37x in Agentic AI, With Zen 6 Venice Pushing the Lead Past 3.3x

AMD’s Marketing Chief Boasts ’15 Out Of 15′ On Amazon’s Best-Seller CPU Chart, Leaving Intel Without A Single Top Spot

Intel’s Z990 Chipset Goes All-In On Gen5, Shrinking Its Die 22% While Pushing Power Up To 14W

AMD’s RX 9070 XT Finally Crashes Steam Survey At 1.33% Share, Closing The Gap On NVIDIA’s RTX 5080 After A Year In Hiding

RTX 5090 Owner Who Religiously Inspected His 12VHPWR Connector Every Month Still Ended Up With A Melted Cable

NVIDIA Blackwell Ultra “GB300” GPU, The Fastest AI Chip, Detailed: Dual Reticle GPU With Over 20K Cores, 288 GB HBM3e Memory at 8 TB/s & 50% Faster Than GB200

NVIDIA's Blackwell Ultra "GB300" Is The Miracle Chip For AI, 50% Faster Than GB200 And Packs 288 GB of Memory

Related Story NVIDIA GB300 Dominates Agentic AI Workloads With 20x Performance Leap Over Hopper As Rubin Nears Launch

Deal of the Day

Further Reading

Trending Stories

Popular Discussions