NVIDIA's DGX Spark was a monumental release from the firm, but at the same time, AMD was shaping up its APU series to create a platform that dominates with on-device AI, which is why Strix Halo APU is claimed to perform better than NVIDIA's GB10 chip in various AI workloads.
NVIDIA's DGX Spark Is a Solid Option For Throughput, But Strix Halo Is the Go-To Platform For the Best Price-to-Perf Figures
Well, the DGX Spark system from NVIDIA is 'one-of-a-kind' offering considering that it's the company's first venture into creating compact devices for AI workloads, and it also marks the development of NVIDIA's custom chip, the GB10. Despite featuring an impressive performance, consumers have voiced their opposition to the price tag of the DGX Spark, claiming that the $4,000 price tag makes the DGX Spark less attractive. However, GMKtec, one of AMD's most reputable mini-PC manufacturers, offers an impressive alternative to the DGX Spark for almost half the price.
In an official blog post, GMKtec has challenged NVIDIA's DGX Spark mini-supercomputer, claiming that the company's EVO-X2 mini-PC, which features AMD's Strix Halo APU onboard, offers better performance in terms of token generation speeds and response times. The manufacturer tested the EVO-X2 with DGX Spark across multiple workloads, deploying open-source models such as Llama 3.3 70B, Qwen3 Coder, GPT-OSS 20B, and Qwen3 0.6B, and here were the results:
| Test Model | Metric | EVO – X2 | NVIDIA GB10 | Winner |
|---|---|---|---|---|
| Llama 3.3 70B | Generation Speed (tok/sec) | 4.9 | 4.67 | AMD |
| First Token Response Time (s) | 0.86 | 0.53 | NVIDIA | |
| Qwen3 Coder | Generation Speed (tok/sec) | 35.13 | 38.03 | NVIDIA |
| First Token Response Time (s) | 0.13 | 0.42 | AMD | |
| GPT-OSS 20B | Generation Speed (tok/sec) | 64.69 | 60.33 | AMD |
| First Token Response Time (s) | 0.19 | 0.44 | AMD | |
| Qwen3 0.6B Model | Generation Speed (tok/sec) | 163.78 | 174.29 | NVIDIA |
| First Token Response Time (s) | 0.02 | 0.03 | AMD |
Based on the company's internal testing, the Strix Halo processor, specifically the Ryzen Al Max+ 395, outperforms the GB10 chip across large-parameter models, while the difference in results for models like the Qwen3 0.6B decreases. The findings indicate that when it comes to first token response time, AMD is the clear winner in many cases, mainly because the CPU+GPU+NPU layout brings lower latency for starting output, and the XDNA 2 engine delivers impressive AI performance.
In other cases where NVIDIA took the lead, it all comes down to how the model being tested prefers throughput over memory latency. Following are NVIDIA's own numbers with the DGX Spark:
| Inference (ISL|OSL= 2048|128, BS=1) | |||||
| Model | Precision | Backend | Prompt processing throughput (tokens/sec) | Token generation throughput (tokens/sec) | |
| Qwen3 14B | NVFP4 | TRT-LLM | 5928.95 | 22.71 | |
| GPT-OSS-20B | MXFP4 | llama.cpp | 3670.42 | 82.74 | |
| GPT-OSS-120B | MXFP4 | llama.cpp | 1725.47 | 55.37 | |
| Llama 3.1 8B | NVFP4 | TRT-LLM | 10256.9 | 38.65 | |
| Qwen2.5-VL-7B-Instruct | NVFP4 | TRT-LLM | 65831.77 | 41.71 | |
| Qwen3 235B (on dual DGX Spark) | NVFP4 | TRT-LLM | 23477.03 | 11.73 | |
There's no doubt that the DGX Spark is the ideal option if consumers are looking for "high-throughput, large model" configuration, since the GB10 Superchip onboard promises 1 PFLOP at FP4, but when it comes to real-time inference workloads across the AMD platform, it offers "similar" performance at much lower cost, especially if your workload is latency-sensitive.
To back this up, the EVO-X2 from GMKtec costs $2,199 (actual MSRP of $2800 US) for the top-tier (128 GB + 2 TB) configuration, while the DGX Spark retails for $4,000, with a few models priced close to $3000 US. If you are interested in deploying models locally, that too at a price that doesn't 'break the bank', opting for workstations like the EVO-X2 is a solid option. So, slightly cheaper than the DGX Spark options on the market.
Follow Wccftech on Google to get more of our news coverage in your feeds.
