This PCIe AI Accelerator Card Can Run 700B LLMs Locally With 384 GB Memory at Just 240W, Less Than Half The Power of RTX PRO 6000 Blackwell

•

May 7, 2026 at 11:15am EDT

A circuit board labeled HTX301 Evaluation Platform features an HTX301 chip in the center.

A Taiwanese company has announced its new PCIe AI accelerator card that can run 700B LLMs locally at just 240W, ending the need for large GPU clusters.

Taiwanese Company Unveils Its PCIe AI Accelerator That Devalues Large-Scale AI Installations By Running 700B LLMs on A Single Card

Skymizer, a Taiwan-based company specializing in AI software and hardware, has announced its brand new solution, the HTX301. The HTX301 is designed for On-Prem AI, offering a PCIe Add-in-Card design and offering large-scale levels of AI performance at sub-250W TDPs.

Some of the highlights of the card include:

Run 700B-parameter model inference on a single PCIe card.
Purpose-built decode acceleration paired with unified prefill/decode orchestration.
On-prem AI with data sovereignty, deterministic latency, and fixed infrastructure cost.

The company says that the HTX301 PCIe AI accelerator is its first inference chip that is built upon the HyperThought platform, which features its next-generation LPU IP. The platform is purpose-built for LLMs with optimized performance and power efficiency in mind.

The HTX301 looks like a standard PCIe card, featuring a single chip with memory scattered around it. The company explains that each board will feature six HTX301 chips, and despite being based on an older 28 nm process, it delivers exceptional results, such as achieving 30 tokens/second with just 0.5 TOPS at 100 GB/s bandwidth. The LPU is also highly scalable, leading to various design options.

The Octa-Core LPU achieves 240 tokens/second in Llama2 7B prefill, and the company can connect multiple chips together for up to 1200 tokens/second in the same LLM with additional support for up to 700B models.

The PCIe card also features up to 384 GB of memory. The card uses standard LPDDR4 and LPDDR5 DRAM, so nothing fancy such as LP5X, HBM, or GDDR6/7. The design is selected for lower parameter counts and DRAM bandwidth requirements. Skymizer's HTX301 architecture also employs efficient compression techniques such as:

Weight (long-term memory) compression outperforms open-source llama.cpp by 9% to 17.8%.
KV cache (short-term memory) compression with minimal perplexity loss (less than 0.06% to 3.52%).

Power characteristics are also a standout with the chip consuming just 240W of power, less than half the 600W of leading PCIe AI accelerators such as the NVIDIA RTX PRO 6000 Blackwell and the AMD Instinct MI350P.

Skymizer is claiming some big numbers and will be previewing the HTX301 at Computex this year, so we will definitely visit their booth and see if the claims hold up, but overall, this sounds like an impressive AI solution (on paper), which should prompt entry-level enterprises to stick with local servers instead of investing in cloud for their AI needs.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

This PCIe AI Accelerator Card Can Run 700B LLMs Locally With 384 GB Memory at Just 240W, Less Than Half The Power of RTX PRO 6000 Blackwell

Taiwanese Company Unveils Its PCIe AI Accelerator That Devalues Large-Scale AI Installations By Running 700B LLMs on A Single Card

Related Story Jefferies Warns Memory Prices Will Surge 50% in Q3 2026 and Another 40% in Q4, With No Relief Until 2028

Further Reading

AMD Launches MI350P, Its First PCIe "Instinct" In Four Years - Packs CDNA 4 GPU With 4.6 PFLOPs AI Compute, 144 GB HBM3E at 600W

Agentic AI Pushes CPUs to Pack 400 GB of Memory, 4x More Than Today, as DRAM Shortage Spirals Toward 2027

Samsung Officially Discontinues LPDDR4 Memory, But Still Sees ~50x Profit Jump & Expects Memory Shortages To Get Worse In 2027

Samsung Leaves Customers High And Dry By Halting LPDDR4 Production, As Memory Crisis Forces A Shift Towards The Profitable LPDDR5 RAM