No, NVIDIA Isn’t Acquiring Groq, But Jensen Just Executed a ‘Surgical’ Masterclass That No One Was Expecting

Dec 25, 2025 at 11:23am EST
A man wearing dollar sign glasses in front of a semiconductor wafer background.

NVIDIA's CEO, Jensen Huang, might have given his chip team a 'Christmas' gift that no one would've expected, as it was reported that Team Green had entered into an agreement with Groq, a company that builds specialized AI hardware. And these aren't simple chips; they could be a gateway for NVIDIA to dominate inference-class workloads.

To understand why this is a 'Masterclass,' we need to check two distinct battlefronts: the regulatory loopholes Jensen has just leveraged, and the hardware dominance he has secured.

Related Story “I Produce The Lowest Cost Tokens In The World” Says NVIDIA CEO As He Highlights The Full-Stack Approach To AI

It Looks Like an Acquisition. It Smells Like an Acquisition. But On Paper, It's Just a 'Non-Exclusive' Arrangement

CNBC was the first to report on this development, claiming that NVIDIA is "buying" Groq Inc. in a mega $20 billion deal, marking the biggest acquisition by Jensen. This led to a massive wildfire in the industry, where some suggested that regulatory investigations would hinder the move, while others claimed it was the end of Groq. However, later on, Groq officially released a statement on its website, stating that it has entered into a "non-exclusive licensing agreement" with NVIDIA, granting the AI giant access to inference technology.

We plan to integrate Groq’s low-latency processors into the NVIDIA AI factory architecture, extending the platform to serve an even broader range of AI inference and real-time workloads. While we are adding talented employees to our ranks and licensing Groq’s IP, we are not acquiring Groq as a company.

- NVIDIA CEO Jensen Huang in an internal mail

Therefore, the perception of a merger, at least on paper, was nullified following Groq's statement. Now, the sequence of events seems quite interesting to me, especially since the only thing this deal lacks to be considered a full-scale acquisition is the avoidance of mentioning it in official disclosures.

This is a classical "Reverse Acqui-hire" move from NVIDIA here, and if someone doesn't know what this means, it is a move from Microsoft's playbook, where the tech giant back in 2024, announced a deal with Inflection worth $653 million, which includes the likes of Mustafa Suleyman and Karén Simonya joining Microsoft, that spearheaded the firm's AI strategy.

Reverse Acqui-hire translates to a company hiring key talent from a startup, and leaving behind a "bare-minimum" corporate structure, which ultimately prevents such a move from being a merger. Now, it appears that Jensen managed to execute something similar to avoid being under the FTC's investigation, as by framing the Groq deal as a "non-exclusive licensing agreement," NVIDIA is essentially outside the scope of the Hart-Scott-Rodino (HSR) Act. Interestingly, Groq mentions that GroqCloud will continue to operate, but only as a 'bare structure'.

What happened is that NVIDIA acquired Groq's talent and IP for a reported $20 billion, managed to escape regulatory investigations, which allowed them to execute the deal in a matter of days. And when you talk about the hardware they now have access to, that's the more interesting part of the NVIDIA-Groq deal.

Groq's LPU Architecture & Why It Could Be the Missing Piece For NVIDIA Dominating the Inference-Class

This is the segment that I am most excited to discuss, as Groq has a hardware ecosystem in place that could replicate NVIDIA's success in the training era, and I'll justify this ahead as well. The AI industry has evolved dramatically in the past few months in terms of compute demand. While companies like OpenAI, Meta, Google, and others are engaged in training frontier models, they are also looking to have a robust inference stack onboard, as that's where most hyperscalers earn money.

When Google announced Ironwood TPUs, the industry hyped it as an inference-focused option, and the ASICs were touted as a replacement for NVIDIA, mainly because there were claims that Jensen had yet to offer a solution that dominated inference throughput. We have the Rubin CPX, but I'll discuss that later. When we talk about inference, compute demand changes dramatically, since with training, the industry requires throughput over latency and high arithmetic intensity, which is why modern-day accelerators are beefed up with HBM and massive tensor cores.

Since hyperscalers are pivoting towards inference, they now require a fast, predictable, and feed-forward execution engine, as response latency is the primary bottleneck. To bring in fast compute, companies like NVIDIA have targeted workloads such as massive-context inference (prefill and general inference) with Rubin CPX, or Google, which touts itself as a more power-efficient choice with TPUs. However, when it comes to decoding, there are not many options available.

Decode refers to the token-generation phase of inference in a transformer model, and it is becoming increasingly important as a key aspect of AI workload classification. Decode requires deterministic and low-latency behaviour, and given the constraints brought in by the use of HBM (latency and power) in inferencing environments, Groq has something unique out there, which is the use of SRAM (Static RAM). It's now time to talk about LPUs, now that I have made it clear why there's a need for a new look at inference compute.

Groq's LPU: Combine High-Latency Decode & Beat Others In Per-Token Predictability

LPUs are a creation of Groq's former CEO, Jonathan Ross, who, by the way, is joining NVIDIA after the recent arrangement. Ross is known for his work with Google's TPUs, so we can be certain that Team Green is acquiring a major asset in-house. LPUs (Language Processing Units) are Groq's solution to inference-class workloads, and the company distinguishes itself from others by being based on two core bets. The first being deterministic execution and on-die SRAM as primary weight storage. This is Groq's approach to achieving speed by ensuring predictability.

Groq has previously showcased two leading solutions: their GroqChip and partner-based GroqCard. Based on the information released in official documents, these chips feature 230 MB of on-die SRAM with up to 80 TB/s of on-die memory bandwidth. The use of SRAM is one of the key advantages of LPUs, as it allows orders-of-magnitude lower latency. With HBM, when you factor in the latency brought on by DRAM access and memory controller queues, SRAM wins by a considerable margin. On-die SRAM enables Groq to achieve tens of terabytes per second of internal bandwidth, allowing the firm to deliver leading throughput.

SRAM also enables Groq to offer a power-efficient platform, as accessing SRAM requires significantly lower energy per bit and eliminates PHY overhead. And, in decode, LPUs lead to energy per token improving significantly, which is a significant factor, given that decode workloads are memory-intensive. This is the architectural aspect of LPUs, and while it may appear significant, it is just one part of how LPUs perform. The other element is leveraging deterministic cycles, which focuses on compile-time scheduling to eliminate time variations across kernels.

Compile-time scheduling ensures that 'delays' within decode pipelines are non-existent, and this is a significant factor, as it allows for perfect pipeline utilization, allowing for a much higher throughput relative to modern-day accelerators. To sum it up, LPUs are dedicated entirely to what hyperscalers need for inference, but there's one caveat that the industry currently ignores. LPUs are real and effective inference hardware, but they're highly specialized and haven't yet become a mainstream default platform, and that's where NVIDIA comes in.

While we still don't know how LPUs can be integrated into NVIDIA's offerings, one way to do it is by offering them as part of rack-scale inference systems (similar to Rubin CPX), paired with networking infrastructure. This would allow GPUs to handle prefill/long-context, with LPUs to focus on decode, essentially meaning that in inference tasks, NVIDIA has everything sorted out. This could transform the image of LPUs from an experimental option to a standard inference method, ensuring their widespread adoption among hyperscalers.

There's no doubt that this deal marks one of the biggest achievements for NVIDIA when it comes to advancing its portfolio, since all indicators point towards the fact that inference will be the next option NVIDIA will talk about, and LPUs will be a core part of the company's strategy for this area of AI workloads.

About the author: Muhammad Zuhair is a hardware and technology reporter for Wccftech, specializing in the semiconductor industry and the complex interplay between technology, manufacturing, and geopolitics. His coverage focuses on the corporate strategies and technological roadmaps of industry giants like TSMC, NVIDIA, Samsung, and Intel. Zuhair's expertise lies in deconstructing complex topics such as fabrication nodes (e.g., 2nm process), the economic impact of policies like the CHIPS Act, and the strategic development of AI infrastructure from NVIDIA, AMD and Intel.

Follow Wccftech on Google to get more of our news coverage in your feeds.