NVIDIA’s Next-Gen Feynman GPUs Could See the Inclusion of Groq’s LPU Units By 2028, Stacked as Separate Dies Similar to AMD’s X3D Approach

Dec 28, 2025 at 01:19pm EST

NVIDIA plans to dominate the inference stack with next-gen Feynman chips, as the firm could integrate LPU units within the architecture.

NVIDIA Could Use Hybrid Bonding With SRAM Dies For Inference, But There Are Several Implications

Team Green's IP licensing agreement for Groq's LPU units might sound like a moderate development when you look at the scope of the acquisition and the revenue figures involved, but in reality, NVIDIA intends to take a lead in the inference segment through LPUs, and we have already discussed this in an extensive coverage here, and as the industry shifts metrics to cost-per-million-tokens. In terms of how NVIDIA plans to integrate LPUs, various propositions have surfaced; however, based on what the GPU expert AGF believes, it appears that LPU units might be stacked on next-gen Feynman GPUs through TSMC's hybrid bonding technology.

Related Story Intel–SambaNova Collaboration Is One Answer to NVIDIA’s Groq Partnership, After It Became Clear GPUs Alone Can’t Dominate Inference

The expert believes that the implementation could resemble what AMD has done with X3D CPUs, utilizing TSMC's SoIC hybrid bonding technology to integrate 3D V-Cache tiles onto the main compute die. AGF argues that integrating SRAM as a monolithic die may not be the right move for Feynman GPUs, considering that SRAM scaling is limited, and building it on advanced nodes would result in wasting high-end silicon and dramatically increasing the usage cost per wafer area. Instead, AGF believes that NVIDIA will stack LPU units onto the Feynman compute die.

Now, the approach sounds sensible, considering that with this, chips like the A16 (1.6nm) will be used for the main Feynman die, which contains the compute blocks (tensor units, control logic, etc.), while separate LPU dies will contain large SRAM banks. Additionally, to connect these dies together, TSMC's hybrid bonding technology will prove crucial, as it will enable a wide interface and lower energy per bit compared to off-package memory. To top it off, since A16 features backside power delivery, the front side would be freed for vertical SRAM connections, ensuring a low-latency decode response.

However, with this technique, there are concerns regarding how NVIDIA will manage thermal limits, as stacking dies on a process that operates at high compute density is already a challenge. And, with LPUs that focus on sustained throughput, it could create bottlenecks. More importantly, execution-level implications will also grow tremendously with such an approach, as LPUs concentrate on a fixed execution order, which, of course, creates a conflict between determinism and flexibility.

Even if NVIDIA manages to resolve hardware-level constraints, the primary concern is caused by how CUDA behaves within LPU-style execution, as it requires explicit memory placement, whereas CUDA kernels are designed for hardware abstraction. Integrating SRAM within AI architectures won't be an easy task for Team Green, as it would require an engineering marvel to ensure LPU-GPU environments are well-optimized. However, this might be the cost NVIDIA is willing to pay if it wants to lead the inference segment.

About the author: Muhammad Zuhair is a hardware and technology reporter for Wccftech, specializing in the semiconductor industry and the complex interplay between technology, manufacturing, and geopolitics. His coverage focuses on the corporate strategies and technological roadmaps of industry giants like TSMC, NVIDIA, Samsung, and Intel. Zuhair's expertise lies in deconstructing complex topics such as fabrication nodes (e.g., 2nm process), the economic impact of policies like the CHIPS Act, and the strategic development of AI infrastructure from NVIDIA, AMD and Intel.

Follow Wccftech on Google to get more of our news coverage in your feeds.