AI Hardware Semiconductor

Tensordyne’s 3nm Napier AI Chip Promises 13x Higher Token Throughput Than Blackwell & Blazes Past Rubin With 1000 Tokens/s In Multi-Trillion Parameter Models

Hassan Mujtaba • Jun 15, 2026 at 01:40pm EDT

A person wearing gloves holds a TensorDyne TDN AIP chip, with visible circuitry and labeled sections.

US-based AI company, Tensordyne, has announced the successful tape-out of its Napier chip, which it claims to demolish NVIDIA's Blackwell & Rubin chips with leading token throughput and efficiency.

Tensordyne’s new Napier AI Chip arrives with one clear mission: to make NVIDIA’s Blackwell and Rubin chips look considerably less impressive

The Napier chip will be the core component of the Tensordyne Napier TDN system, which is designed in collaboration with Broadcom and HPE Juniper Networks. The Napier platform has one goal: to unify AI through novel logarithmic AI math, a tightly integrated memory architecture, and a high-performance scale-up interconnect that drives higher token throughput at low power.

Napier is built on TSMC's 3nm process, and with its successful tape-out, the chip is now in production. With the primary milestone achievement, Tensordyne is now working towards beta deployment and a broader infrastructure plan that represents over $200 million in forecasted Napier system demand. And the key area of focus is AI inferencing.

The chip features 138 billion transistors & comes with 144 GB of HBM3E memory, 256 MB of SRAM, and features 2.1 PFLOPs of peak AI compute using Dense FP8 format. It also features a 300W TDP.

We just talked about how current AI infrastructure is constrained by power consumption, but to tackle these constraints, solutions such as 800V DC are going to incur a huge deployment cost. Infrastructures such as power and cooling alone make up 50% of the cost of major AI deployments, and to address these, Tensordyne has come up with a new inference stack across math, compute, memory, and networking:

TDN Math (Logarithmic Mathematics)

TDN replaces large-scale multiplication operations with simplified addition-based computation, significantly improving performance-per-watt efficiency across frontier AI models.

TDN AIP (Artificial Intelligence Processor)

Each TDN processor tightly integrates substantial fast SRAM alongside HBM memory, minimizing idle compute cycles and supporting efficient execution of the industry’s largest models.

TDN Link (Any-to-Any Scale-Up Interconnect)

Tensordyne’s proprietary scale-up fabric delivers sub-microsecond communication latency between processors, maximizing compute utilization and minimizing interconnect bottlenecks.

All of this is brought together in Tensordyne's TDN72 Inference Pod and Rack system. Each Pod is fitted with 72 Napier AI chips, which are similar to NVIDIA's NVL72 racks, each featuring 72 Blackwell or Rubin GPUs. It requires way less infrastructure capacity, and a Napier Rack combines for TDN72 pods to deliver:

17x more tokens per watt (vs NVIDIA Blackwell)
13x more tokens per second (vs NVIDIA Blackwell)
Up to $33 million more annual revenue per rack

A comparison between the Nvidia Blackwell NVL72 GB300 and the Tensordyne Napier TDN72 highlights their performance, with the text 'Matching performance 4x smaller. 5x less energy' displayed.

A presentation slide titled Tensordyne Napier TDN features a TDN rack with '4x TDN72 Pods' and specifications, alongside a TDN72 pod with '72x TDN AIP Chips', highlighting components like 'Logarithmic Mathematics', 'Artificial Intelligence Processor', 'AI Compute Tray', and 'Scale-up Interconnect'.

The 72-chip Napier servers will offer 10 TB of HBM capacity and can sustain up to 10T models with FP4. The full air-cooled rack will feature a total of 288 chips (72 per server), for 608 PFLOPs of FP8 compute, 74 GB of SRAM, 42 TB of HBM3e memory, and a rated power of 120kW.

Tensordyne doesn't stop at just Blackwell comparison; they also compare the Napier solution against NVIDIA's upcoming Rubin platform. The company claims that its platform supports multi-trillion parameter models with a throughput of 1000 tokens/s per use in a single-rack configuration. To do the same, NVIDIA will require nine Rubin + Groq LPX racks.

Tensordyne’s Napier platform represents a bold leap forward in AI inference. By delivering 17× more tokens per watt and 13× higher throughput than NVIDIA Blackwell, while matching the performance of nine Rubin-based racks in a single compact footprint, it shatters the traditional speed-versus-cost and power-versus-performance trade-offs.

With dramatically lower infrastructure demands, up to $33 million more annual revenue per rack, and efficient scaling for multi-trillion parameter models, Napier doesn’t just compete with NVIDIA’s Blackwell & Rubin; it redefines what’s possible for next-generation AI deployment.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on Tensordyne’s 3nm Napier AI Chip Promises 13x Higher Token Throughput Than Blackwell & Blazes Past Rubin With 1000 Tokens/s In Multi-Trillion Parameter Models

Tensordyne’s 3nm Napier AI Chip Promises 13x Higher Token Throughput Than Blackwell & Blazes Past Rubin With 1000 Tokens/s In Multi-Trillion Parameter Models

Tensordyne’s new Napier AI Chip arrives with one clear mission: to make NVIDIA’s Blackwell and Rubin chips look considerably less impressive

TDN Math (Logarithmic Mathematics)

TDN AIP (Artificial Intelligence Processor)

TDN Link (Any-to-Any Scale-Up Interconnect)

Trending Stories

PlayStation 6 Controller Could Ditch the Part That Wears Out, After Years of DualSense Stick Drift Complaints

Amazon Backpedals on 007 First Light Sequel Threat, Admits IO Interactive Should Probably Make the James Bond Sequel

Qualcomm Admits Its Snapdragon 8 Elite Gen 6 Will Become More Expensive, As Chipset Maker Aims For Double-Digit Hike Due To Higher Supplier Costs

AMD EPYC “Venice” Gives Us A Preview of Zen 6-Based Ryzen “Olympic Ridge” CPUs: More Cores, More (3D V-)Cache, Clocks & Scalable Configs

DeepSeek CEO Believes NVIDIA Is Now “Digging Its Own Grave” Even As 1 NVIDIA GB300 GPU Equals 4 Huawei Acend 950 GPUs

Popular Discussions

AMD Medusa Point 10-Core “Zen 6” CPU Beats Strix Point 10-Core “Zen 5” By Nearly 35% While Operating at 5.4 GHz

Watch The AMD “Advancing AI 2026” Event Live Here – Next-Gen Zen 6 EPYC CPUs, Instinct MI400 Series & Helios AI Rack Launch

AMD Unveils Helios, Its Next-Gen AI Powerhouse With MI455X & 6th Gen EPYC, Challenging NVIDIA’s Rack-Scale Dominance

AMD Zen 7 “2028” and Zen 8 “2030” CPU Architectures Confirmed – EPYC Florence “Zen 7” To Feature Next-Gen Node, & ACE Extensions

AMD EPYC “Venice” Gives Us A Preview of Zen 6-Based Ryzen “Olympic Ridge” CPUs: More Cores, More (3D V-)Cache, Clocks & Scalable Configs

Tensordyne’s 3nm Napier AI Chip Promises 13x Higher Token Throughput Than Blackwell & Blazes Past Rubin With 1000 Tokens/s In Multi-Trillion Parameter Models

Tensordyne’s new Napier AI Chip arrives with one clear mission: to make NVIDIA’s Blackwell and Rubin chips look considerably less impressive

Related Story Samsung’s Advantage Over TSMC As A Foundry Rival Expands Beyond The 2nm Process, Helping It Ink Multi-Billion-Dollar Agreements; Has Potential To Outgrow Its Competitor

TDN Math (Logarithmic Mathematics)

TDN AIP (Artificial Intelligence Processor)

TDN Link (Any-to-Any Scale-Up Interconnect)

Further Reading

Trending Stories

Popular Discussions