Analysis Hardware Industry

OpenAI’s Latest Codex Model Runs on Cerebras Infrastructure, Hinting at a ‘Serious’ Second Option for AI Inference Beyond NVIDIA

Muhammad Zuhair • Feb 13, 2026 at 12:35pm EST

Two prominent individuals appear deep in thought against a background of server racks. — Image Credits: Wccftech

Cerebras' AI chips have seen their first mainstream adoption by none other than OpenAI, as the AI lab reveals that their latest Codex model has another compute provider alongside NVIDIA.

OpenAI Has Managed to Achieve a Shocking 1,000 TPS Output, With Cerebras' Blazing-Fast Bandwidth

Well, there has been an NVIDIA-OpenAI saga on the financing front, but it appears that, in the race for compute, OpenAI has taken an interesting route through its earlier partnership with Cerebras. In the company's recent Codex release, it is disclosed that the GPT‑5.3‑Codex‑Spark is powered by Cerebras' AI chips and, more specifically, that the benefit of using the hardware over others is 'low latency' in inference workloads, which we'll discuss ahead. The more interesting aspect of the choice of compute here is that OpenAI has, indirectly, declared a 'formidable' rival to NVIDIA in inference.

Now, the difference between mainstream Codex models and the 'Spark' variant here is that OpenAI claims it is designed to get "work done in the moment". With GPT‑5.3‑Codex‑Spark, major improvements in model latency have been achieved by optimizing pipelines and, more importantly, leveraging Cerebras' hardware. OpenAI claims it has reduced time-to-first-token by 50% with this release, which is certainly a fascinating figure to talk about. Codex-Spark runs on Cerebras' Wafer Scale Engine 3, and here is the technical breakdown:

Specification	WSE-3
Process Node	TSMC 5nm
Transistors	~4 trillion
Compute Cores	900,000 AI-optimized cores
On-Chip SRAM	44 GB
Memory Bandwidth (On-Chip)	21 PB/s
Wafer Size	Full 300mm wafer-scale chip
Core Architecture	AI-optimized programmable processing cores

Now, as to why OpenAI chose Cerebras for compute here, there are several reasons. But one of the most important ones is how, with WSE-3, OpenAI gets insane memory bandwidth, which is crucial for memory-bounded workloads like coding. This is why, with Codex-Spark, OpenAI achieves 1000 TPS, which is claimed to be as responsive as a "human pair programmer". Spark is economically impractical for OpenAI to train on NVIDIA's infrastructure, given that Blackwell focuses more on batch processing than latency, which is why Cerebras makes sense here.

But when we talk about inference at scale, NVIDIA dominates the tokenomics, and we saw this in the company's recent talk about how it has lowered token costs by up to 10x with Blackwell. OpenAI's Sachin Katti says that with Cerebras, the company adds on "complementary capabilities", but the AI lab's loyalty in the compute race is all towards NVIDIA. However, with Codex-Spark, we can clearly see that the bottleneck today is latency, and at the hardware level, NVIDIA's tech stack isn't well-positioned to dominate this area.

It would be interesting to see how the inference market positions NVIDIA moving forward, given that Cerebras is just one of the formidable rivals in this segment, alongside emerging solutions from ASIC manufacturers and competitors like AMD.

About the author: Muhammad Zuhair is a hardware and technology reporter for Wccftech, specializing in the semiconductor industry and the complex interplay between technology, manufacturing, and geopolitics. His coverage focuses on the corporate strategies and technological roadmaps of industry giants like TSMC, NVIDIA, Samsung, and Intel. Zuhair's expertise lies in deconstructing complex topics such as fabrication nodes (e.g., 2nm process), the economic impact of policies like the CHIPS Act, and the strategic development of AI infrastructure from NVIDIA, AMD and Intel.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on OpenAI’s Latest Codex Model Runs on Cerebras Infrastructure, Hinting at a ‘Serious’ Second Option for AI Inference Beyond NVIDIA

OpenAI’s Latest Codex Model Runs on Cerebras Infrastructure, Hinting at a ‘Serious’ Second Option for AI Inference Beyond NVIDIA

OpenAI Has Managed to Achieve a Shocking 1,000 TPS Output, With Cerebras' Blazing-Fast Bandwidth

Trending Stories

Nintendo Doubles Down on Switch 2 Security, But Developer Gezine Cracks a Universal Exploit That Works Entirely Offline

Xbox Studio Leaders Reportedly Detest Game Pass, Arguing it Destroyed the Value of Their $40+ Games Now Available for Pennies

A Modder Fits Entire Grand Theft Auto PS2 Trilogy Inside a Single Game, While Rockstar Continues to Prepare GTA 6

ADATA Chairman Warns DRAM Shortage Will Last Another 10 Years, Says AI Bubble Talk Can Wait Until 2040

Some Newer GeForce RTX 5060 GPUs Transition To 16-pin Connector As Vendors Deploy Cut-Down GB205 Die

Popular Discussions

AMD Medusa Point 10-Core “Zen 6” CPU Beats Strix Point 10-Core “Zen 5” By Nearly 35% While Operating at 5.4 GHz

AMD Ryzen 7 7700X3D 4.5 GHz “3D V-Cache” CPU Review: The Budget X3D Champ For AM5

NVIDIA GeForce RTX 50 SUPER GPUs Have Reportedly Arrived At AIBs, But Are On Hold Due To Undecided Memory Prices

AMD Ryzen 7 5800X3D Outsells Ryzen 7 7800X3D For The Same Price On Amazon Despite Being Weaker

AMD Ryzen 7 7800X3D CPU Drops To $299 A Day Ahead of 7700X3D’s Launch, Bringing 3D V-Cache Goodness To Mainstream Gamers

OpenAI’s Latest Codex Model Runs on Cerebras Infrastructure, Hinting at a ‘Serious’ Second Option for AI Inference Beyond NVIDIA

OpenAI Has Managed to Achieve a Shocking 1,000 TPS Output, With Cerebras' Blazing-Fast Bandwidth

Related Story Jim Keller Says Cerebras IPO Was Helpful As Tenstorrent Set To “Beat Them on Everything”, Confirms Meeting With Intel & Qualcomm CEOs “Hoping To Get A Big Deal”

Further Reading

Trending Stories

Popular Discussions