NVIDIA Unveils Vera Rubin With Groq’s LPX to Break Into Inference, a Market Where It Has Never Been First

Mar 16, 2026 at 03:48pm EDT
A presenter on stage with three open computer servers, showcasing internal components against a black background.

NVIDIA's Groq partnership is now formalizing, as Jensen unveils a hybrid compute tray featuring Groq's third-generation LPU units in a Rubin rack.

NVIDIA's Idea With Groq Is to Target 'High-Speed' Workloads, Hoping to Crack the Inference Competition

The debate over what NVIDIA would do with Groq has been ongoing for quite some time, and we have maintained a key lead on developments. At GTC 2026, NVIDIA unveiled a new Vera Rubin hybrid compute tray, the Groq 3 LPX, which features eight of the 'unannounced' Groq3 units, which we'll discuss ahead. According to NVIDIA, LPX and Rubin together deliver unprecedented inference performance, enabling a 35x increase in inference throughput per megawatt, which is why Groq's solution was a key to NVIDIA unlocking the inference market.

Related Story NVIDIA’s Rubin AI Platform Alone Will Devour More LPDDR Memory in 2027 Than Apple and Samsung Combined, Starving Smartphone Supply

As for the individual compute tray, we are looking at a rack with 256 units of LPUs, bringing in 128GB of on-chip SRAM and 640 TB/s of scale-up bandwidth. This is NVIDIA's answer to what Cerebras and competitors are doing in the realm of inference, and by essentially combining Rubin GPUs with LPUs, NVIDIA targets both the prefill and decode stages of inference, allowing the company to become competitive in a market where 'they aren't the first ones'.

For an individual Groq3 chip, you are looking at 500 MB of SRAM, 150 TB/s of SRAM bandwidth, and 1.2 PFLOPs (FP8). When you combine Rubin and Groq's LPX tray, NVIDIA's CEO says that the total AI inference compute reaches up to 315 PFLOPs, and here's a close look at the inside of the tray:

Optimized for trillion-parameter models and million-token context, the codesigned LPX architecture pairs with Vera Rubin to maximize efficiency across power, memory and compute. The additional throughput per watt and token performance unlocks a new tier of ultra-premium, trillion-parameter, million-context inference, expanding revenue opportunity for all AI providers.

The idea is that Groq's LPU units will play a role similar to Mellanox's in networking, and that this hybrid architecture will give NVIDIA a head start on latency-sensitive workloads. With agentic AI becoming the next 'inflection' point for the industry, it is essential for NVIDIA to keep up with the compute demands, which is why Groq's partnership came at a vital time for Team Green.

About the author: Muhammad Zuhair is a hardware and technology reporter for Wccftech, specializing in the semiconductor industry and the complex interplay between technology, manufacturing, and geopolitics. His coverage focuses on the corporate strategies and technological roadmaps of industry giants like TSMC, NVIDIA, Samsung, and Intel. Zuhair's expertise lies in deconstructing complex topics such as fabrication nodes (e.g., 2nm process), the economic impact of policies like the CHIPS Act, and the strategic development of AI infrastructure from NVIDIA, AMD and Intel.

Follow Wccftech on Google to get more of our news coverage in your feeds.