QNAP Pairs a 6-Year-Old Zen 2 EPYC With NVIDIA’s 96GB RTX PRO 6000 Blackwell in Its New Edge AI NAS

•

May 2, 2026 at 02:00pm EDT

Two NVIDIA GPUs are placed on a QNAP storage unit, with a monitor displaying a desert scene in the background.

QNAP has introduced its new AI NAS, which packs a 16-Core AMD EPYC "Zen 2" CPU & can be paired with up to an RTX PRO 6000 Blackwell GPU.

Old Meets New In QNAP's "QAI-h1290FX" AI NAS: 16 Zen 2 Cores & 96 GB RTX PRO 6000 GPU

The "QAI-h1290FX" is QNAP's latest Edge AI offering that combines two distinct hardware components. But before we get into the details, it should be mentioned that QNAP's latest AI NAS is designed for LLM, RAG, and various GenAI applications.

Two components power the server, the first is the AMD EPYC 7302P CPU, which features 16 cores and 32 threads. This chip is based on the Zen 2 core architecture and offers enough performance to handle AI inference tasks at edge.

The second component is the GPU, which QNAP offers two options: either the 32 GB RTX PRO 4500 Blackwell or NVIDIA's flagship 96 GB RTX PRO 6000 Blackwell. Both offer absolutely behemoth compute capabilities for AI, with the PRO 4500 aimed for up to ~30B LLMs, while the PRO 6000 is ideal for 70B+ AI LLMs.

Besides the CPU/GPU, the QNAP QAI-h1290FX features support for 12 U.2 NVMe/SATA SSDs, and there is also high-speed networking support in the form of dual 25GbE and dual 2.5GbE LAN ports. The PCIe slots can also carry additional 100GbE AICs, though those are sold separately. The NAS is also compatible with QNAP's JBOD expansion enclosures for large-scale AI data storage.

The main features of the NAS include:

All-Flash Storage Architecture: Twelve U.2 NVMe/SATA SSD slots enable ultra-fast I/O for high-frequency AI model execution and data streaming.
16-core AMD EPYC 7302P Processor: Provides 32 threads of server-class compute power—ideal for AI inference, virtualization, and heavy parallel workloads.
GPU-ready Architecture: Supports optional NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation GPU, featuring up to 96GB of GPU memory and support for CUDA, Tensor, and Transformer Engine acceleration—significantly boosting performance for on-prem LLM inference, image generation, and deep learning workloads.
Containerized AI Environment & GPU Resource Management: Supports Docker and LXD with intuitive GPU allocation. Users can quickly launch AI tools via the built-in AI app center and assign GPU resources without command-line configuration.
Fully Local Deployment with No Cloud Dependency: Run AI-powered chat assistants, document search engines, or knowledge bases fully on-premises. Keep sensitive data in-house while accelerating AI workflows.
High-speed Networking and Scalable Architecture: Comes with dual 25GbE and dual 2.5GbE ports. PCIe slots support optional 100GbE upgrades. Compatible with QNAP JBOD expansion enclosures for large-scale AI data storage.

QNAP also shares some real-world performance of its new AI NAS using the NVIDIA RTX PRO 6000 96 GB Blackwell GPU. The tests include various models of different sizes, offering up to 172 Tokens/second. The results can be seen below:

Mode	Token/sec	VRAM Usage
gpt-oss:120b (MXFP4)	90 Token/sec	~63GB
deepseek-r1:70b (q4_K_M)	24 Token/sec	~41GB
qwen3:32b (q4_K_M)	46 Token/sec	~21GB
gemma3:27b (q4_K_M)	54 Token/sec	~19GB
deepseek-r1:8b (q4_K_M)	140 Token/sec	~7GB
qwen3:8b (q4_K_M)	172 Token/sec	~7GB

Besides the LLMs run natively through Ollama, QNAP is sharing the vLLM concurrent inference tests for the same configuration with the results provided below:

Tested Large Language Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B (Hugging Face)

Thread	Total Token/sec	avg Token/Thread/Sec
1	79 Token/sec	79 Token/sec
2	166 Token/sec	83 Token/sec
5	410 Token/sec	82 Token/sec
10	688 Token/sec	68.8 Token/sec
20	810 Token/sec	40.5 Token/sec
50	850 Token/sec	17 Token/sec

Tested Large Language Model: openai/gpt-oss-20b (Hugging Face)

Thread	Total Token/sec	avg Token/Thread/Sec
1	218 Token/sec	218 Token/sec
2	340 Token/sec	170 Token/sec
5	1045 Token/sec	209 Token/sec
10	880 Token/sec	88 Token/sec
20	600 Token/sec	30 Token/sec

QNAP offers a wide range of storage, network, and interface expansion cards, which can be bought separately to expand the capabilities of the AI server. RAM is also sold separately with options ranging from 8 GB DDR4-3200 modules up to 64 GB DDR4-3200 kits. The system comes with a 5-year warranty, and is priced at $8999 for the 64 GB, $13,499 for the 128 GB, and $15,999 for the 256 GB variant.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

QNAP Pairs a 6-Year-Old Zen 2 EPYC With NVIDIA’s 96GB RTX PRO 6000 Blackwell in Its New Edge AI NAS

Old Meets New In QNAP's "QAI-h1290FX" AI NAS: 16 Zen 2 Cores & 96 GB RTX PRO 6000 GPU

Related Story OpenAI’s First Custom Chip Is As Hot As A Jalapeño For AI, As The Firm Calls It The “Best Inference Platform” for LLMs

Further Reading

AMD's Instinct MI355X Snags a $350 Million Customer as TensorWave Doubles Down Ahead of the MI455X vs Vera Rubin Showdown

Computex 2026 Will Be NVIDIA's Biggest Event Of The Year. Here's What To Expect

Micron Doubles Down on AI Memory With 256 GB DDR5 RDIMMs Hitting 9200 MT/s, a 40% Leap Over Today's Modules

AMD 3D V-Cache Turns Ryzen Into a Surprise RAG AI Weapon, With An 88% Boost Over Non-X3D CPUs