AI Hardware Industry

Here’s How NVIDIA’s Blackwell Ultra GB300 AI Racks Are Dominating Long-Context DeepSeek Workloads, Delivering Impressive Gains Versus GB200

Muhammad Zuhair

• Feb 21, 2026 at 01:51pm EST

A person stands next to a large NVIDIA data center server rack with multiple GPUs and visible branding. — Image Credits: NVIDIA

NVIDIA's GB300 NVL72 AI racks have been tested across DeepSeek's latest open source models, and through fine-tuning and optimized inference, the results are indeed promising.

NVIDIA's Blackwell Ultra Scores Up to a 1.5x Lead Over GB200 NVL72 In Latency-Sensitive Workloads

With GB300, NVIDIA's primary focus has been on delivering optimal long-context performance in order to capitalize on the agentic AI wave. In a recent post, we discussed how Blackwell Ultra delivers a 50x increase in throughput per megawatt compared to Hopper GPUs through its extreme co-design approach. Now, the Large Model Systems Organization (LMSYS) has tested GB300 NVL72 for long-context inference, with results looking extremely promising. The testing does include infrastructure-level software routing, which we'll discuss next.

Given that with long-context workloads, the pressure tends to shift more towards GPU VRAM, the LMSYS team integrated PD (Prefill-Decode) Disaggregation, a widely used mechanism for maintaining large-scale token context. In simple terms, with PD Disaggregation, you split work across different hardware "nodes" to avoid bottlenecks. The prefill phase, which is, in simple terms, prompt processing, along with the decode phase, which is token generation, tends to be much more optimized with disaggregation, leading to improved throughput at scale.

The LMSYS team also employed several other optimization techniques, including dynamic chunking for optimized prompt responses under long-context windows and effective KV capacity translation. In terms of generational improvements, the team noted the following primary benchmarks: throughput analysis, capacity, and latency ratio.

NVIDIA's GB300 NVL72 vs GB200 NVL72:

1.53x Peak Throughput: 226.2 TPS/GPU (Tokens Per Second)
1.87x User Speed: Massive jump in TPS/User via MTP (Multi-Token Prediction).
1.58x Latency Win

According to the LMSYS team, the GB300 on average secures a 1.4x to 1.5x lead over GB200, especially in latency-sensitive scenarios, and, given the focus on agentic workloads, Blackwell Ultra is best positioned to capitalize on them. While Blackwell Ultra surely looks dominant in latency and throughput, we haven't seen TCO figures discussed in the industry yet, especially since, with GB300, deployment costs have risen in parallel.

NVIDIA's approach with each generation appears to focus not just on architectural advancements but also on addressing industry-specific constraints, and in Blackwell Ultra's case, latency figures have seen significant improvements. This is one of the reasons why, in agentic environments, GB300 is emerging as a leading choice for hyperscalers and neoclouds.

About the author: Muhammad Zuhair is a hardware and technology reporter for Wccftech, specializing in the semiconductor industry and the complex interplay between technology, manufacturing, and geopolitics. His coverage focuses on the corporate strategies and technological roadmaps of industry giants like TSMC, NVIDIA, Samsung, and Intel. Zuhair's expertise lies in deconstructing complex topics such as fabrication nodes (e.g., 2nm process), the economic impact of policies like the CHIPS Act, and the strategic development of AI infrastructure from NVIDIA, AMD and Intel.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on Here’s How NVIDIA’s Blackwell Ultra GB300 AI Racks Are Dominating Long-Context DeepSeek Workloads, Delivering Impressive Gains Versus GB200

Here’s How NVIDIA’s Blackwell Ultra GB300 AI Racks Are Dominating Long-Context DeepSeek Workloads, Delivering Impressive Gains Versus GB200

NVIDIA's Blackwell Ultra Scores Up to a 1.5x Lead Over GB200 NVL72 In Latency-Sensitive Workloads

NVIDIA's GB300 NVL72 vs GB200 NVL72:

Trending Stories

Valve Is “Irresponsible” To Force AI Disclosures on Steam, Epic CEO Says, While Unreal Engine 6 Doubles Down on AI

GTA 6 Physical Disc Reportedly Arrives This December, While Ray-Traced Screenshots May Not Reflect Console Reality

Qualcomm Claims Single-Core Leadership for Its First Server CPU, the Dragonfly C1000, Delivering 250+ Cores & 5 GHz By 2028

Micron Blames Apple For The Ongoing Memory Crisis, Says It “Took Advantage” Of The Last Down Cycle To “Pay Rock-Bottom Prices,” Deterring Capacity Expansion

Hygon’s 128-Core & 512-Thread C86 CPU Targets Intel Xeon With 15% IPC Gain, As China Races to Cut Foreign Chip Reliance

Popular Discussions

AMD Reportedly Plots Another 10-15% RX 9000 Price Hike As The RAMpocalypse Swallows The GPU Market

AMD Rolls Out FSR 4.1 For RX 7000 GPUs, Builds a Lightweight ML Model for RDNA 3.5 and RDNA 3 iGPUs

AMD’s FSR 4.1 Doubles RX 7900 XTX frame Rates In Cyberpunk 2077, Jumping From 24 To 50 FPS At 4K

AMD Reverses Course On Removing TSME From Ryzen Chips; Will Reinstate The Feature Through A New BIOS Update

YouTuber Daniel Owen And Club386 Got Their RTX 5090 Connectors Cooked; Club386 Calls It A “Flawed Design”

Here’s How NVIDIA’s Blackwell Ultra GB300 AI Racks Are Dominating Long-Context DeepSeek Workloads, Delivering Impressive Gains Versus GB200

NVIDIA's Blackwell Ultra Scores Up to a 1.5x Lead Over GB200 NVL72 In Latency-Sensitive Workloads

Related Story Agentic AI Pushes CPUs to Pack 400 GB of Memory, 4x More Than Today, as DRAM Shortage Spirals Toward 2027

NVIDIA's GB300 NVL72 vs GB200 NVL72:

Further Reading

Trending Stories

Popular Discussions