NVIDIA's Blackwell GB300 has posted record performance in AA-AgentPerf, a new benchmark that measures Agentic AI workflows.
NVIDIA Blackwell Ultra GB300 is 20 Times Faster Than Hopper In Agentic AI, Records Highest Performance In Latest Benchmarks
Artificial Analysis has a new benchmark out called AA-AgentPerf, which measures how many active agents an inference deployment can support under realistic workloads, which include:
- Real agentic trajectories — multi-turn coding sessions with interleaved reasoning, tool calls, and variable context lengths (not synthetic uniform prompts).
- Sustained concurrent load — simulated agents maintain continuous in-flight requests, stressing KV cache reuse, speculative decoding, and scheduler behavior.
- Market-derived SLO tiers — performance thresholds based on Artificial Analysis serverless API benchmarking data, reflecting quality-of-service levels observed across providers.
- Continuously updated — results are updated on an ongoing basis as new hardware, software stacks, and model versions become available.
- Production-ready — models are tested with realistic optimizations enabled and production-scale deployment topologies.

The AA-AgentPerf benchmark is used to measure three key metrics, which form the basis of modern-day AI deployments, such as:
- Time to First Token (TTFT): Per-request latency from sending the request to receiving the first output token.
- Output Speed: Per-request output tokens per second, measured after the first token is received.
- System Output Throughput: Aggregate output tokens per second across all concurrent agents.
NVIDIA is now publishing its first benchmarks in AgentPerf measures using DeepSeek V4 Pro on its GB300 NVL72 platform. This model represents the type of Frontier models that power agents today & are widely used for AI.

In the first round of benchmarks, NVIDIA has recorded the fastest performance with its GB300 hardware, posting a 20x lead (per MegaWatt) over its older HGX H200 platform. GB300 can sustain up to 60,000 concurrent agents per MW, a massive leap over Hopper.
| Benchmark | Value of metric | NVIDIA GB300 NVL72 | NVIDIA H200 |
| Concurrent agents per MW | Energy efficiency: How many active agents a system can support for a given power budget | 61.4K | 2.6K |
| Concurrent agents per GPU | Hardware efficiency: How much serving capacity is achieved per GPU | 57.5 | 1.4 |
NVIDIA states that the performance highlights NVIIA's GB300 NVL72 and Blackwell's ability to run large-scale agentic coding workloads while keeping the GPUs fully utilized across several concurrent agent sessions.
Looking forward, NVIDIA's Rubin is just on the horizon and is expected to extend these leads through a supercharged AI architecture, which will offer 50 PFLOPs of compute from NVFP4, and with the Vera CPU, the LLM tool calls and end-to-end performance will see major performance and efficiency gains.
Follow Wccftech on Google to get more of our news coverage in your feeds.





