OpenAI’s Latest Codex Model Runs on Cerebras Infrastructure, Hinting at a ‘Serious’ Second Option for AI Inference Beyond NVIDIA

Muhammad Zuhair
Two prominent individuals appear deep in thought against a background of server racks.
Image Credits: Wccftech

Cerebras' AI chips have seen their first mainstream adoption by none other than OpenAI, as the AI lab reveals that their latest Codex model has another compute provider alongside NVIDIA.

OpenAI Has Managed to Achieve a Shocking 1,000 TPS Output, With Cerebras' Blazing-Fast Bandwidth

Well, there has been an NVIDIA-OpenAI saga on the financing front, but it appears that, in the race for compute, OpenAI has taken an interesting route through its earlier partnership with Cerebras. In the company's recent Codex release, it is disclosed that the GPT‑5.3‑Codex‑Spark is powered by Cerebras' AI chips and, more specifically, that the benefit of using the hardware over others is 'low latency' in inference workloads, which we'll discuss ahead. The more interesting aspect of the choice of compute here is that OpenAI has, indirectly, declared a 'formidable' rival to NVIDIA in inference.

Related Story Arm Doubles AGI CPU Revenue Forecast to $2 Billion by 2028 as OpenAI, Cerebras, and Hyperscalers Pile Into Agentic AI Orders

Now, the difference between mainstream Codex models and the 'Spark' variant here is that OpenAI claims it is designed to get "work done in the moment". With GPT‑5.3‑Codex‑Spark, major improvements in model latency have been achieved by optimizing pipelines and, more importantly, leveraging Cerebras' hardware. OpenAI claims it has reduced time-to-first-token by 50% with this release, which is certainly a fascinating figure to talk about. Codex-Spark runs on Cerebras' Wafer Scale Engine 3, and here is the technical breakdown:

SpecificationWSE-3
Process NodeTSMC 5nm
Transistors~4 trillion
Compute Cores900,000 AI-optimized cores
On-Chip SRAM44 GB
Memory Bandwidth (On-Chip)21 PB/s
Wafer SizeFull 300mm wafer-scale chip
Core ArchitectureAI-optimized programmable processing cores

Now, as to why OpenAI chose Cerebras for compute here, there are several reasons. But one of the most important ones is how, with WSE-3, OpenAI gets insane memory bandwidth, which is crucial for memory-bounded workloads like coding. This is why, with Codex-Spark, OpenAI achieves 1000 TPS, which is claimed to be as responsive as a "human pair programmer". Spark is economically impractical for OpenAI to train on NVIDIA's infrastructure, given that Blackwell focuses more on batch processing than latency, which is why Cerebras makes sense here.

But when we talk about inference at scale, NVIDIA dominates the tokenomics, and we saw this in the company's recent talk about how it has lowered token costs by up to 10x with Blackwell. OpenAI's Sachin Katti says that with Cerebras, the company adds on "complementary capabilities", but the AI lab's loyalty in the compute race is all towards NVIDIA. However, with Codex-Spark, we can clearly see that the bottleneck today is latency, and at the hardware level, NVIDIA's tech stack isn't well-positioned to dominate this area.

It would be interesting to see how the inference market positions NVIDIA moving forward, given that Cerebras is just one of the formidable rivals in this segment, alongside emerging solutions from ASIC manufacturers and competitors like AMD.

Muhammad Zuhair Photo

About the author: Muhammad Zuhair is a hardware and technology reporter for Wccftech, specializing in the semiconductor industry and the complex interplay between technology, manufacturing, and geopolitics. His coverage focuses on the corporate strategies and technological roadmaps of industry giants like TSMC, NVIDIA, Samsung, and Intel. Zuhair's expertise lies in deconstructing complex topics such as fabrication nodes (e.g., 2nm process), the economic impact of policies like the CHIPS Act, and the strategic development of AI infrastructure from NVIDIA, AMD and Intel.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Button