QNAP has introduced its new AI NAS, which packs a 16-Core AMD EPYC "Zen 2" CPU & can be paired with up to an RTX PRO 6000 Blackwell GPU.
Old Meets New In QNAP's "QAI-h1290FX" AI NAS: 16 Zen 2 Cores & 96 GB RTX PRO 6000 GPU
The "QAI-h1290FX" is QNAP's latest Edge AI offering that combines two distinct hardware components. But before we get into the details, it should be mentioned that QNAP's latest AI NAS is designed for LLM, RAG, and various GenAI applications.
Two components power the server, the first is the AMD EPYC 7302P CPU, which features 16 cores and 32 threads. This chip is based on the Zen 2 core architecture and offers enough performance to handle AI inference tasks at edge.
The second component is the GPU, which QNAP offers two options: either the 32 GB RTX PRO 4500 Blackwell or NVIDIA's flagship 96 GB RTX PRO 6000 Blackwell. Both offer absolutely behemoth compute capabilities for AI, with the PRO 4500 aimed for up to ~30B LLMs, while the PRO 6000 is ideal for 70B+ AI LLMs.
Besides the CPU/GPU, the QNAP QAI-h1290FX features support for 12 U.2 NVMe/SATA SSDs, and there is also high-speed networking support in the form of dual 25GbE and dual 2.5GbE LAN ports. The PCIe slots can also carry additional 100GbE AICs, though those are sold separately. The NAS is also compatible with QNAP's JBOD expansion enclosures for large-scale AI data storage.
The main features of the NAS include:
- All-Flash Storage Architecture: Twelve U.2 NVMe/SATA SSD slots enable ultra-fast I/O for high-frequency AI model execution and data streaming.
- 16-core AMD EPYC 7302P Processor: Provides 32 threads of server-class compute power—ideal for AI inference, virtualization, and heavy parallel workloads.
- GPU-ready Architecture: Supports optional NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation GPU, featuring up to 96GB of GPU memory and support for CUDA, Tensor, and Transformer Engine acceleration—significantly boosting performance for on-prem LLM inference, image generation, and deep learning workloads.
- Containerized AI Environment & GPU Resource Management: Supports Docker and LXD with intuitive GPU allocation. Users can quickly launch AI tools via the built-in AI app center and assign GPU resources without command-line configuration.
- Fully Local Deployment with No Cloud Dependency: Run AI-powered chat assistants, document search engines, or knowledge bases fully on-premises. Keep sensitive data in-house while accelerating AI workflows.
- High-speed Networking and Scalable Architecture: Comes with dual 25GbE and dual 2.5GbE ports. PCIe slots support optional 100GbE upgrades. Compatible with QNAP JBOD expansion enclosures for large-scale AI data storage.
QNAP also shares some real-world performance of its new AI NAS using the NVIDIA RTX PRO 6000 96 GB Blackwell GPU. The tests include various models of different sizes, offering up to 172 Tokens/second. The results can be seen below:
| Mode | Token/sec | VRAM Usage |
|---|---|---|
| gpt-oss:120b (MXFP4) | 90 Token/sec | ~63GB |
| deepseek-r1:70b (q4_K_M) | 24 Token/sec | ~41GB |
| qwen3:32b (q4_K_M) | 46 Token/sec | ~21GB |
| gemma3:27b (q4_K_M) | 54 Token/sec | ~19GB |
| deepseek-r1:8b (q4_K_M) | 140 Token/sec | ~7GB |
| qwen3:8b (q4_K_M) | 172 Token/sec | ~7GB |
Besides the LLMs run natively through Ollama, QNAP is sharing the vLLM concurrent inference tests for the same configuration with the results provided below:
Tested Large Language Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B (Hugging Face)
| Thread | Total Token/sec | avg Token/Thread/Sec |
|---|---|---|
| 1 | 79 Token/sec | 79 Token/sec |
| 2 | 166 Token/sec | 83 Token/sec |
| 5 | 410 Token/sec | 82 Token/sec |
| 10 | 688 Token/sec | 68.8 Token/sec |
| 20 | 810 Token/sec | 40.5 Token/sec |
| 50 | 850 Token/sec | 17 Token/sec |
Tested Large Language Model: openai/gpt-oss-20b (Hugging Face)
| Thread | Total Token/sec | avg Token/Thread/Sec |
|---|---|---|
| 1 | 218 Token/sec | 218 Token/sec |
| 2 | 340 Token/sec | 170 Token/sec |
| 5 | 1045 Token/sec | 209 Token/sec |
| 10 | 880 Token/sec | 88 Token/sec |
| 20 | 600 Token/sec | 30 Token/sec |
QNAP offers a wide range of storage, network, and interface expansion cards, which can be bought separately to expand the capabilities of the AI server. RAM is also sold separately with options ranging from 8 GB DDR4-3200 modules up to 64 GB DDR4-3200 kits. The system comes with a 5-year warranty, and is priced at $8999 for the 64 GB, $13,499 for the 128 GB, and $15,999 for the 256 GB variant.
Follow Wccftech on Google to get more of our news coverage in your feeds.
