NVIDIA’s Rubin CPX Is Off the Roadmap, Replaced by Groq LPUs for Inference, But May Return With Feynman in 2028

Mar 19, 2026 at 09:22am EDT
A presentation slide titled 'NVIDIA Extreme Co-Design Delivering X-Factors Every Year' features future chip architecture timelines labeled Blackwell, Rubin, and Feynman with components like 'Blackwell Ultra HBM4e' and 'BlueField-5,' alongside a speaker on stage pointing at the display.

NVIDIA's Rubin CPX chip was surprisingly not shown at GTC, and according to a new update, it appears the solution is currently 'delayed' and positioned for Feynman.

NVIDIA's CPX Chip Would Now Feature With Feynman, As Groq Fills the Inference Gap

For those unaware, NVIDIA has been trying to crack the inference segment by releasing dedicated solutions since ASICs gained traction around Q3 of last year, and one of those launches was the Rubin CPX chip. It was one of the first rack-focused solutions to feature GDDR7 memory on board, and the idea was to target prefill workloads in inference. However, at this year's GTC, Rubin CPX wasn't present at all when Jensen showcased the Rubin lineup, suggesting the solution might have been canceled or delayed. But NVIDIA's VP Ian Buck has an update (via ComputerBase).

Related Story NVIDIA’s Rubin AI Platform Alone Will Devour More LPDDR Memory in 2027 Than Apple and Samsung Combined, Starving Smartphone Supply

While discussing NVIDIA's roadmap, Buck revealed that Rubin CPX has been pushed forward, but the idea hasn't been dropped. Instead, we could expect a similar solution to debut with Feynman, which is scheduled for a few years from now. It appears that the idea of a CPX chip is currently unfeasible for NVIDIA, given that workload demands have evolved from long-context to prioritizing TTFT. And with that, the Rubin LPX tray, featuring Groq's LPU units, has gained greater significance, as it focuses on the decode stage of an inference workload.

NVIDIA has been hyper-focused on what it has brought to the table with the Groq partnership, and based on what we are seeing, the firm loves the idea of achieving impressive inference throughput. Since LPUs feature an SRAM implementation, the individual bandwidth scales up to 150 TB/s, and the rack as a whole delivers 640 TB/s of scale-up bandwidth, which is why NVIDIA decided to stick with the LPX tray rather than CPX. There were also reports that Team Green was revising the CPX design, looking to replace GDDR7 with HBM, suggesting that the Feynman CPX wouldn't be exactly like what we saw with Rubin.

Jensen calls NVIDIA the "inference king", and the Groq solution is one way that the company is eager to maintain this lead. As for the cancellation of Rubin CPX, well, it does free up GDDR7 capacity that would've gone to an AI chip, so there's some joy for gamers out there, too.

About the author: Muhammad Zuhair is a hardware and technology reporter for Wccftech, specializing in the semiconductor industry and the complex interplay between technology, manufacturing, and geopolitics. His coverage focuses on the corporate strategies and technological roadmaps of industry giants like TSMC, NVIDIA, Samsung, and Intel. Zuhair's expertise lies in deconstructing complex topics such as fabrication nodes (e.g., 2nm process), the economic impact of policies like the CHIPS Act, and the strategic development of AI infrastructure from NVIDIA, AMD and Intel.

Follow Wccftech on Google to get more of our news coverage in your feeds.