FuriosaAI Ditches GPU Playbook For 2nm Broadcom-Built Inference Chip, Claims HBM4/E Bandwidth Beats Even The Most Efficient GPUs

•

May 27, 2026 at 11:40am EDT

FuriosaAI Ditches GPU Playbook For 2nm Broadcom-Built Inference Chip, Claims HBM4/E Bandwidth Beats Even The Most Efficient GPUs

FuriosaAI and Broadcom have partnered to build a high-performance AI accelerator chip featuring next-gen HBM4/E memory.

FuriosaAI's Next-Gen AI Accelerator Features 2nm Chiplet Architecture, HBM4/E Memory Support for Massive AI Compute Clusters

FuriosaAI has announced its third-generation AI accelerator, which builds upon its 2nd Generation RNGD platform, which is currently in mass production on TSMC's 5nm process technology. The 2nd Gen RNGD AI platform comes in the form of a 180W PCIe-based design, which aims at LLM & Agentic AI workloads. The next-generation design is going to go all-in on the AI inference segment as Agentic AI continues to see huge demand.

The third-generation AI accelerator from FuriosaAI has the following highlights:

The platform pairs 2nm compute technology with HBM4/4E memory, designed to enable high-bandwidth, rack-scale networking across massive AI compute clusters.
The architecture is optimized for demanding inference workloads with a focus on high-bandwidth data movement that delivers higher performance-per-watt and greater token density than even the most efficient GPUs.
It builds on Furiosa’s current-generation RNGD chip, now in mass production. Customers include Samsung SDS and LG AI Research.

Starting with some of the details shared by FuriosaAI, the chip platform will utilize an advanced 2nm compute die and HBM4/E memory standard. The firm is working with Broadcom to harness advanced packaging capabilities, allowing them to integrate multiple silicon dies into a singular & performant AI chip (System-on-chip).

In the teaser shot, the company shows the 3rd Gen AI chip with 12 HBM4/E memory sites, two massive compute chiplets (2nm), and two IO controllers. That rounds up to 432 GB if Furiosa uses 12-Hi 36 GB per stack memory modules.

Besides the compute architecture, FuriosaAI will also leverage Broadcom's Ethernet and PCIe IPs, allowing higher bandwidth, rack-scale networking across massive AI compute clusters. The AI chip is optimized for demanding real-world AI workloads such as post-training sampling, and high bandwidth is a key focus & that's why the company is going with the latest HBM4/E standards.

The company claims that its focus on bandwidth rather than thread management (required by GPUs) will help it deliver higher efficiency and higher token throughput than modern GPU designs. Furthermore, the company is saying that its software stack allows developers to deploy new AI models quickly while meeting throughput and latency requirements.

Furiosa’s SDK leverages a general compiler that automatically maps high-level PyTorch code to silicon. For developers requiring more granular control, Furiosa’s Virtual ISA offers a declarative programming model that provides hardware control without the nondeterministic complexity of traditional GPU programming.

“Bringing together Broadcom’s infrastructure capabilities and Furiosa’s Tensor Contraction Processor architecture and its industry-defining software stack allows us to move beyond the chip level and deliver a comprehensive solution for the token factory era,” said Furiosa Cofounder and CEO June Paik.

As for availability, the 3rd Gen FuriosaAI accelerator is expected to begin sampling by the first half of 2028 and will be ready to meet compute requirements for next-gen AI data centers.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.

FuriosaAI Ditches GPU Playbook For 2nm Broadcom-Built Inference Chip, Claims HBM4/E Bandwidth Beats Even The Most Efficient GPUs

FuriosaAI's Next-Gen AI Accelerator Features 2nm Chiplet Architecture, HBM4/E Memory Support for Massive AI Compute Clusters

Related Story AMD Unveils Helios, Its Next-Gen AI Powerhouse With MI455X & 6th Gen EPYC, Challenging NVIDIA’s Rack-Scale Dominance

Further Reading

Korean AI Startup, Furiosa AI, Is Doubling Its Chip Production To 50,000 Units Next Year While Its Upcoming 2nm "Stork" Chip Challenges NVIDIA With The "World's Best Inference"

NVIDIA's Rubin Ultra Rack Estimated To Cost $21 Million, With HBM4e Memory Alone Swelling To $1.5 Million Per Unit

Broadcom Avoids Qualcomm’s Fate; Enters Into New Agreement With Apple, One That Ensures A Steady Chip Supply And Revenue For Several Years

OpenAI's First Custom Chip Is As Hot As A Jalapeño For AI, As The Firm Calls It The "Best Inference Platform" for LLMs