NVIDIA GB300 Dominates Agentic AI Workloads With 20x Performance Leap Over Hopper As Rubin Nears Launch

•

Jun 14, 2026 at 08:25am EDT

A close-up view of an NVIDIA circuit board featuring multiple processing units, mounted on a dark background.

NVIDIA's Blackwell GB300 has posted record performance in AA-AgentPerf, a new benchmark that measures Agentic AI workflows.

NVIDIA Blackwell Ultra GB300 is 20 Times Faster Than Hopper In Agentic AI, Records Highest Performance In Latest Benchmarks

Artificial Analysis has a new benchmark out called AA-AgentPerf, which measures how many active agents an inference deployment can support under realistic workloads, which include:

Real agentic trajectories — multi-turn coding sessions with interleaved reasoning, tool calls, and variable context lengths (not synthetic uniform prompts).
Sustained concurrent load — simulated agents maintain continuous in-flight requests, stressing KV cache reuse, speculative decoding, and scheduler behavior.
Market-derived SLO tiers — performance thresholds based on Artificial Analysis serverless API benchmarking data, reflecting quality-of-service levels observed across providers.
Continuously updated — results are updated on an ongoing basis as new hardware, software stacks, and model versions become available.
Production-ready — models are tested with realistic optimizations enabled and production-scale deployment topologies.

The AA-AgentPerf benchmark is used to measure three key metrics, which form the basis of modern-day AI deployments, such as:

Time to First Token (TTFT): Per-request latency from sending the request to receiving the first output token.
Output Speed: Per-request output tokens per second, measured after the first token is received.
System Output Throughput: Aggregate output tokens per second across all concurrent agents.

NVIDIA is now publishing its first benchmarks in AgentPerf measures using DeepSeek V4 Pro on its GB300 NVL72 platform. This model represents the type of Frontier models that power agents today & are widely used for AI.

In the first round of benchmarks, NVIDIA has recorded the fastest performance with its GB300 hardware, posting a 20x lead (per MegaWatt) over its older HGX H200 platform. GB300 can sustain up to 60,000 concurrent agents per MW, a massive leap over Hopper.

Benchmark	Value of metric	NVIDIA GB300 NVL72	NVIDIA H200
Concurrent agents per MW	Energy efficiency: How many active agents a system can support for a given power budget	61.4K	2.6K
Concurrent agents per GPU	Hardware efficiency: How much serving capacity is achieved per GPU	57.5	1.4

NVIDIA states that the performance highlights NVIIA's GB300 NVL72 and Blackwell's ability to run large-scale agentic coding workloads while keeping the GPUs fully utilized across several concurrent agent sessions.

Looking forward, NVIDIA's Rubin is just on the horizon and is expected to extend these leads through a supercharged AI architecture, which will offer 50 PFLOPs of compute from NVFP4, and with the Vera CPU, the LLM tool calls and end-to-end performance will see major performance and efficiency gains.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

NVIDIA GB300 Dominates Agentic AI Workloads With 20x Performance Leap Over Hopper As Rubin Nears Launch

NVIDIA Blackwell Ultra GB300 is 20 Times Faster Than Hopper In Agentic AI, Records Highest Performance In Latest Benchmarks

Related Story NVIDIA Floods Europe With 35 Supercomputers Spanning 23 Countries, Stacking Up To 800 Exaflops Of AI Compute

Further Reading

NVIDIA Blackwell Sweeps Every MLPerf 6.0 Benchmark With No Competition In Sight, While GB300 Systems Run Up to 60% Faster Than GB200

NVIDIA's First Co-Packaged Optics Switch Lands at Lambda, Cutting 3kW Per Rack and Freeing Power for 3,137 Extra GPUs

NVIDIA Wants Everyone To Rethink AI TCO, & Explains Why "Cost Per Token" Is The Only Metric That Matters

NVIDIA's AI GPUs Used To Train OpenAI's GPT-5.2, Blackwell & Blackwell Ultra Continue To Blaze Ahead With Better Performance & Value