NVIDIA GB300 Dominates Agentic AI Workloads With 20x Performance Leap Over Hopper As Rubin Nears Launch

Hassan Mujtaba
A close-up view of an NVIDIA circuit board featuring multiple processing units, mounted on a dark background.

NVIDIA's Blackwell GB300 has posted record performance in AA-AgentPerf, a new benchmark that measures Agentic AI workflows.

NVIDIA Blackwell Ultra GB300 is 20 Times Faster Than Hopper In Agentic AI, Records Highest Performance In Latest Benchmarks

Artificial Analysis has a new benchmark out called AA-AgentPerf, which measures how many active agents an inference deployment can support under realistic workloads, which include:

Related Story NVIDIA’s First Co-Packaged Optics Switch Lands at Lambda, Cutting 3kW Per Rack and Freeing Power for 3,137 Extra GPUs
  • Real agentic trajectories — multi-turn coding sessions with interleaved reasoning, tool calls, and variable context lengths (not synthetic uniform prompts).
  • Sustained concurrent load — simulated agents maintain continuous in-flight requests, stressing KV cache reuse, speculative decoding, and scheduler behavior.
  • Market-derived SLO tiers — performance thresholds based on Artificial Analysis serverless API benchmarking data, reflecting quality-of-service levels observed across providers.
  • Continuously updated — results are updated on an ongoing basis as new hardware, software stacks, and model versions become available.
  • Production-ready — models are tested with realistic optimizations enabled and production-scale deployment topologies.

The AA-AgentPerf benchmark is used to measure three key metrics, which form the basis of modern-day AI deployments, such as:

  • Time to First Token (TTFT): Per-request latency from sending the request to receiving the first output token.
  • Output Speed: Per-request output tokens per second, measured after the first token is received.
  • System Output Throughput: Aggregate output tokens per second across all concurrent agents.

NVIDIA is now publishing its first benchmarks in AgentPerf measures using DeepSeek V4 Pro on its GB300 NVL72 platform. This model represents the type of Frontier models that power agents today & are widely used for AI.

In the first round of benchmarks, NVIDIA has recorded the fastest performance with its GB300 hardware, posting a 20x lead (per MegaWatt) over its older HGX H200 platform. GB300 can sustain up to 60,000 concurrent agents per MW, a massive leap over Hopper.

BenchmarkValue of metricNVIDIA GB300 NVL72NVIDIA H200
Concurrent agents per MWEnergy efficiency: How many active agents a system can support for a given power budget61.4K2.6K
Concurrent agents per GPUHardware efficiency: How much serving capacity is achieved per GPU57.51.4

NVIDIA states that the performance highlights NVIIA's GB300 NVL72 and Blackwell's ability to run large-scale agentic coding workloads while keeping the GPUs fully utilized across several concurrent agent sessions.

Looking forward, NVIDIA's Rubin is just on the horizon and is expected to extend these leads through a supercharged AI architecture, which will offer 50 PFLOPs of compute from NVFP4, and with the Vera CPU, the LLM tool calls and end-to-end performance will see major performance and efficiency gains.

Hassan Mujtaba Photo

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Deal of the Day

Button