FuriosaAI Ditches GPU Playbook For 2nm Broadcom-Built Inference Chip, Claims HBM4/E Bandwidth Beats Even The Most Efficient GPUs

May 27, 2026 at 11:40am EDT
FuriosaAI Ditches GPU Playbook For 2nm Broadcom-Built Inference Chip, Claims HBM4/E Bandwidth Beats Even The Most Efficient GPUs

FuriosaAI and Broadcom have partnered to build a high-performance AI accelerator chip featuring next-gen HBM4/E memory.

FuriosaAI's Next-Gen AI Accelerator Features 2nm Chiplet Architecture, HBM4/E Memory Support for Massive AI Compute Clusters

FuriosaAI has announced its third-generation AI accelerator, which builds upon its 2nd Generation RNGD platform, which is currently in mass production on TSMC's 5nm process technology. The 2nd Gen RNGD AI platform comes in the form of a 180W PCIe-based design, which aims at LLM & Agentic AI workloads. The next-generation design is going to go all-in on the AI inference segment as Agentic AI continues to see huge demand.

Related Story SK Hynix Previews HBM4E Memory at Computex, Cramming 48GB Into a 12-Hi Stack and Pushing Bandwidth to a Record 4 TB/s

The third-generation AI accelerator from FuriosaAI has the following highlights:

Starting with some of the details shared by FuriosaAI, the chip platform will utilize an advanced 2nm compute die and HBM4/E memory standard. The firm is working with Broadcom to harness advanced packaging capabilities, allowing them to integrate multiple silicon dies into a singular & performant AI chip (System-on-chip).

In the teaser shot, the company shows the 3rd Gen AI chip with 12 HBM4/E memory sites, two massive compute chiplets (2nm), and two IO controllers. That rounds up to 432 GB if Furiosa uses 12-Hi 36 GB per stack memory modules.

Besides the compute architecture, FuriosaAI will also leverage Broadcom's Ethernet and PCIe IPs, allowing higher bandwidth, rack-scale networking across massive AI compute clusters. The AI chip is optimized for demanding real-world AI workloads such as post-training sampling, and high bandwidth is a key focus & that's why the company is going with the latest HBM4/E standards.

The company claims that its focus on bandwidth rather than thread management (required by GPUs) will help it deliver higher efficiency and higher token throughput than modern GPU designs. Furthermore, the company is saying that its software stack allows developers to deploy new AI models quickly while meeting throughput and latency requirements.

Furiosa’s SDK leverages a general compiler that automatically maps high-level PyTorch code to silicon. For developers requiring more granular control, Furiosa’s Virtual ISA offers a declarative programming model that provides hardware control without the nondeterministic complexity of traditional GPU programming.

“Bringing together Broadcom’s infrastructure capabilities and Furiosa’s Tensor Contraction Processor architecture and its industry-defining software stack allows us to move beyond the chip level and deliver a comprehensive solution for the token factory era,” said Furiosa Cofounder and CEO June Paik.

As for availability, the 3rd Gen FuriosaAI accelerator is expected to begin sampling by the first half of 2028 and will be ready to meet compute requirements for next-gen AI data centers.

About the author: A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as Wccftech's Senior Editor for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking.

Follow Wccftech on Google to get more of our news coverage in your feeds.