QNAP Pairs a 6-Year-Old Zen 2 EPYC With NVIDIA’s 96GB RTX PRO 6000 Blackwell in Its New Edge AI NAS

Hassan Mujtaba
Two NVIDIA GPUs are placed on a QNAP storage unit, with a monitor displaying a desert scene in the background.

QNAP has introduced its new AI NAS, which packs a 16-Core AMD EPYC "Zen 2" CPU & can be paired with up to an RTX PRO 6000 Blackwell GPU.

Old Meets New In QNAP's "QAI-h1290FX" AI NAS: 16 Zen 2 Cores & 96 GB RTX PRO 6000 GPU

The "QAI-h1290FX" is QNAP's latest Edge AI offering that combines two distinct hardware components. But before we get into the details, it should be mentioned that QNAP's latest AI NAS is designed for LLM, RAG, and various GenAI applications.

Related Story Computex 2026 Will Be NVIDIA’s Biggest Event Of The Year. Here’s What To Expect

Two components power the server, the first is the AMD EPYC 7302P CPU, which features 16 cores and 32 threads. This chip is based on the Zen 2 core architecture and offers enough performance to handle AI inference tasks at edge.

The second component is the GPU, which QNAP offers two options: either the 32 GB RTX PRO 4500 Blackwell or NVIDIA's flagship 96 GB RTX PRO 6000 Blackwell. Both offer absolutely behemoth compute capabilities for AI, with the PRO 4500 aimed for up to ~30B LLMs, while the PRO 6000 is ideal for 70B+ AI LLMs.

Besides the CPU/GPU, the QNAP QAI-h1290FX features support for 12 U.2 NVMe/SATA SSDs, and there is also high-speed networking support in the form of dual 25GbE and dual 2.5GbE LAN ports. The PCIe slots can also carry additional 100GbE AICs, though those are sold separately. The NAS is also compatible with QNAP's JBOD expansion enclosures for large-scale AI data storage.

The main features of the NAS include:

  • All-Flash Storage Architecture: Twelve U.2 NVMe/SATA SSD slots enable ultra-fast I/O for high-frequency AI model execution and data streaming.
  • 16-core AMD EPYC 7302P Processor: Provides 32 threads of server-class compute power—ideal for AI inference, virtualization, and heavy parallel workloads.
  • GPU-ready Architecture: Supports optional NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation GPU, featuring up to 96GB of GPU memory and support for CUDA, Tensor, and Transformer Engine acceleration—significantly boosting performance for on-prem LLM inference, image generation, and deep learning workloads.
  • Containerized AI Environment & GPU Resource Management: Supports Docker and LXD with intuitive GPU allocation. Users can quickly launch AI tools via the built-in AI app center and assign GPU resources without command-line configuration.
  • Fully Local Deployment with No Cloud Dependency: Run AI-powered chat assistants, document search engines, or knowledge bases fully on-premises. Keep sensitive data in-house while accelerating AI workflows.
  • High-speed Networking and Scalable Architecture: Comes with dual 25GbE and dual 2.5GbE ports. PCIe slots support optional 100GbE upgrades. Compatible with QNAP JBOD expansion enclosures for large-scale AI data storage.

QNAP also shares some real-world performance of its new AI NAS using the NVIDIA RTX PRO 6000 96 GB Blackwell GPU. The tests include various models of different sizes, offering up to 172 Tokens/second. The results can be seen below:

ModeToken/secVRAM Usage
gpt-oss:120b (MXFP4)90 Token/sec~63GB
deepseek-r1:70b (q4_K_M)24 Token/sec~41GB
qwen3:32b (q4_K_M)46 Token/sec~21GB
gemma3:27b (q4_K_M)54 Token/sec~19GB
deepseek-r1:8b (q4_K_M)140 Token/sec~7GB
qwen3:8b (q4_K_M)172 Token/sec~7GB

Besides the LLMs run natively through Ollama, QNAP is sharing the vLLM concurrent inference tests for the same configuration with the results provided below:

Tested Large Language Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B (Hugging Face)

ThreadTotal Token/secavg Token/Thread/Sec
179 Token/sec79 Token/sec
2166 Token/sec83 Token/sec
5410 Token/sec82 Token/sec
10688 Token/sec68.8 Token/sec
20810 Token/sec40.5 Token/sec
50850 Token/sec17 Token/sec

Tested Large Language Model: openai/gpt-oss-20b (Hugging Face)

ThreadTotal Token/secavg Token/Thread/Sec
1218 Token/sec218 Token/sec
2340 Token/sec170 Token/sec
51045 Token/sec209 Token/sec
10880 Token/sec88 Token/sec
20600 Token/sec30 Token/sec

QNAP offers a wide range of storage, network, and interface expansion cards, which can be bought separately to expand the capabilities of the AI server. RAM is also sold separately with options ranging from 8 GB DDR4-3200 modules up to 64 GB DDR4-3200 kits. The system comes with a 5-year warranty, and is priced at $8999 for the 64 GB, $13,499 for the 128 GB, and $15,999 for the 256 GB variant.

Hassan Mujtaba Photo

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Button