Groq’s Inference Chips Are Beating NVIDIA’s Blackwell by 5x on Cost – And Doing It Twice as Fast

Apr 23, 2026 at 03:52pm EDT
A person in a shiny jacket holds up a graphics card on stage with a presentation backdrop.

As AI computing capacity continues to grow, an expert from computing infrastructure provider Nebius sat with AlphaSense to describe the state of the industry. While NVIDIA's leading-edge AI GPUs remain the top in the industry when it comes to performance, the expert believes alternatives are growing in popularity, particularly as the industry shifts its cost metrics. The demand for AI computing capacity also remains high, as providers can easily run at 100% utilization rates to drive down costs and earn the most from their investment.

Alternatives To NVIDIA Chips Grow In Popularity As Industry Shifts Towards Cost Per Million Tokens From GPU Per Hour, Says Expert

According to the expert, current pricing in the AI infrastructure industry depends on the kind of GPU being used and whether the capacity is reserved in advance or required on demand. For on demand capacity, NVIDIA's H100 GPUs cost $2.95 per hour, the H200 cost $3.50 per hour, and the latest Blackwell B200s cost between $4.90 per hour and $6.50 per hour.

Related Story NVIDIA’s 96 GB RTX PRO 6000 Blackwell Is Now Over 50% More Expensive As Price Hits $13,250

However, the prices drop if the capacity is reserved. For reserved capacity covering contracts ranging between one to two years and requiring at least 10,000 GPUs, the H100, H200 and B200s cost $1.50, $2.20 and at least $3.50 for the B200.

The Enterprise Shift: Why Inference is Driving Token-Based Pricing

At the close of 2025, NVIDIA announced that it had entered into a non-exclusive licensing agreement with chip startup Groq. The deal was NVIDIA's largest to date at the time and it covered the startup's AI inference technology. According to the Nebius expert, inference is now responsible for between 90% to 95% of the total demand of enterprise workloads. This is due to the fact that firms now rely on pretrained models or APIs instead of developing their own software.

The shift to inferencing from training isn't the only change occurring in the AI infrastructure market, says the expert. Another change taking place is the shift towards a different cost structure, accompanied by the growth in demand for alternative chips to NVIDIA's GPUs.

Cost-Per-Million Tokens: NVIDIA Blackwell vs. Groq Breakdown

The alternate cost structure is now seeing firms charge their users by the token, or by the million tokens. As per the details, Groq's chips are significantly budget-friendly as they cost between five and 10 cents per million tokens. On the other hand, NVIDIA's GPUs cost five times as much, with the B100, B200, or B300 costing 25 cents per million tokens. Additionally, not only are Gross's chips cost-friendly, but they are faster since the Nebius expert says that they are capable of delivering up to 800 tokens per second, which is nearly double the output of the NVIDIA chips' 450 tokens per second.

MetricNVIDIA (Blackwell B200)Groq LPU
Cost (Per 1M Tokens)$0.25$0.10 (60% Cheaper)
Throughput (Tokens/Sec)450800 (77% Faster)
Primary WorkloadHeavy Training / EnterpriseHigh-Speed Inference

About the author: Ramish is a seasoned technology writer and editor with more than a decade of experience. He specializes in semiconductor fabrication and market analysis. With a background in finance and supply chain management - via his bachelors in Finance and a micromasters in supply chain management from MIT - Ramish combines financial rigor with deep industry insight to deliver accurate and authoritative coverage.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Deal of the Day