NVIDIA’s 96GB RTX PRO 6000 Matches Four RTX 5090s on a 230B AI Model While Drawing a Quarter the Power

Apr 19, 2026 at 03:40am EDT
NVIDIA's 96 GB RTX PRO 6000 Blackwell Graphics Card Is Now Over 50% More Expensive As Pricing Touches $13,250 1

NVIDIA RTX Pro 6000 Blackwell shows why single GPUs are better than mainstream Multi-GPU setups in running large AI models, even outpacing four RTX 5090s.

A Single RTX PRO 6000 Blackwell GPU Runs 230B AI Model At Quarter The Power of Four RTX 5090s

Steveibe on X has shared some benchmarks of his test suite, showcasing whether it is possible to run large AI models at home. For demonstration, he used MiniMax M2.7, which is a 230B AI inference model & ran it through four different test setups, all powered by NVIDIA GPUs. A context size of 32k & max token length of 4096 were used for evaluation.

Related Story Finding the Fun Is Still Human Work for 20+ Years, But AI Could Spark a New Gaming Renaissance, Says Dev

The user states that he went with IQ3_XXS, which is a GGUF quantization method and supports hardware with lower VRAM configurations, but at the same time, it is the biggest quant that fits in the 96 GB VRAM of the RTX PRO 6000 GPU. The same quant was used across all four test setups, and the results are below:

In terms of token generation speed, a single NVIDIA RTX PRO 6000 Blackwell GPU produced 118.74 Tks/s. For comparison, four RTX 5090 GPUs with a total VRAM capacity of 128 GB produced 120.54 Tks/s, & four of the older generation RTX 4090s (4 x 24 GB) produced 71.52 Tks/s. The DGX Spark Mini AI PC produced 24.41 Tks/s while featuring 128 GB of memory.

While the four RTX 5090s are on par with a single RTX PRO 6000 Blackwell, we only see half of the story, as Token generation speed should not be the only metric to go by. We also have to factor in power and price.

When comparing power consumption, we see a bigger difference. Both the quad-GPU setups (RTX 4090 and RTX 5090) consume 1800W and 2300W, respectively. A single RTX PRO 6000 Blackwell GPU sips just 600W of power.

That's quarter the power of the four RTX 5090s and 1/3rd the power of the four 4090s. The DGX Spark consumes a total system power of 240W, so it is a decent machine given its much lower consumption & full-system package, which is, quote on quote, Prefill-Friendly, and runs over a wall socket.

Now we have to talk about pricing, a single RTX PRO 6000 Blackwell retails around $9500 US, while a single RTX 5090 retails around $3500, so four 5090s end up at $14,000 US. Meanwhile, the DGX Spark retails for $4699 US after incurring a price hike.

While AI models can leverage multiple GPUs and harness their peak memory capacity, there are still overheads associated with certain configurations, and those can be seen here. A single RTX PRO 6000 Blackwell 96 GB overcomes these and offers much better performance capabilities at a better value and with higher efficiency.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Deal of the Day