FuriosaAI and Broadcom have partnered to build a high-performance AI accelerator chip featuring next-gen HBM4/E memory.
FuriosaAI's Next-Gen AI Accelerator Features 2nm Chiplet Architecture, HBM4/E Memory Support for Massive AI Compute Clusters
FuriosaAI has announced its third-generation AI accelerator, which builds upon its 2nd Generation RNGD platform, which is currently in mass production on TSMC's 5nm process technology. The 2nd Gen RNGD AI platform comes in the form of a 180W PCIe-based design, which aims at LLM & Agentic AI workloads. The next-generation design is going to go all-in on the AI inference segment as Agentic AI continues to see huge demand.
The third-generation AI accelerator from FuriosaAI has the following highlights:
- The platform pairs 2nm compute technology with HBM4/4E memory, designed to enable high-bandwidth, rack-scale networking across massive AI compute clusters.
- The architecture is optimized for demanding inference workloads with a focus on high-bandwidth data movement that delivers higher performance-per-watt and greater token density than even the most efficient GPUs.
- It builds on Furiosa’s current-generation RNGD chip, now in mass production. Customers include Samsung SDS and LG AI Research.
Starting with some of the details shared by FuriosaAI, the chip platform will utilize an advanced 2nm compute die and HBM4/E memory standard. The firm is working with Broadcom to harness advanced packaging capabilities, allowing them to integrate multiple silicon dies into a singular & performant AI chip (System-on-chip).
In the teaser shot, the company shows the 3rd Gen AI chip with 12 HBM4/E memory sites, two massive compute chiplets (2nm), and two IO controllers. That rounds up to 432 GB if Furiosa uses 12-Hi 36 GB per stack memory modules.
Besides the compute architecture, FuriosaAI will also leverage Broadcom's Ethernet and PCIe IPs, allowing higher bandwidth, rack-scale networking across massive AI compute clusters. The AI chip is optimized for demanding real-world AI workloads such as post-training sampling, and high bandwidth is a key focus & that's why the company is going with the latest HBM4/E standards.
The company claims that its focus on bandwidth rather than thread management (required by GPUs) will help it deliver higher efficiency and higher token throughput than modern GPU designs. Furthermore, the company is saying that its software stack allows developers to deploy new AI models quickly while meeting throughput and latency requirements.
Furiosa’s SDK leverages a general compiler that automatically maps high-level PyTorch code to silicon. For developers requiring more granular control, Furiosa’s Virtual ISA offers a declarative programming model that provides hardware control without the nondeterministic complexity of traditional GPU programming.
“Bringing together Broadcom’s infrastructure capabilities and Furiosa’s Tensor Contraction Processor architecture and its industry-defining software stack allows us to move beyond the chip level and deliver a comprehensive solution for the token factory era,” said Furiosa Cofounder and CEO June Paik.
As for availability, the 3rd Gen FuriosaAI accelerator is expected to begin sampling by the first half of 2028 and will be ready to meet compute requirements for next-gen AI data centers.
Follow Wccftech on Google to get more of our news coverage in your feeds.
