Cerebras' AI chips have seen their first mainstream adoption by none other than OpenAI, as the AI lab reveals that their latest Codex model has another compute provider alongside NVIDIA.
OpenAI Has Managed to Achieve a Shocking 1,000 TPS Output, With Cerebras' Blazing-Fast Bandwidth
Well, there has been an NVIDIA-OpenAI saga on the financing front, but it appears that, in the race for compute, OpenAI has taken an interesting route through its earlier partnership with Cerebras. In the company's recent Codex release, it is disclosed that the GPT‑5.3‑Codex‑Spark is powered by Cerebras' AI chips and, more specifically, that the benefit of using the hardware over others is 'low latency' in inference workloads, which we'll discuss ahead. The more interesting aspect of the choice of compute here is that OpenAI has, indirectly, declared a 'formidable' rival to NVIDIA in inference.
Now, the difference between mainstream Codex models and the 'Spark' variant here is that OpenAI claims it is designed to get "work done in the moment". With GPT‑5.3‑Codex‑Spark, major improvements in model latency have been achieved by optimizing pipelines and, more importantly, leveraging Cerebras' hardware. OpenAI claims it has reduced time-to-first-token by 50% with this release, which is certainly a fascinating figure to talk about. Codex-Spark runs on Cerebras' Wafer Scale Engine 3, and here is the technical breakdown:
| Specification | WSE-3 |
|---|---|
| Process Node | TSMC 5nm |
| Transistors | ~4 trillion |
| Compute Cores | 900,000 AI-optimized cores |
| On-Chip SRAM | 44 GB |
| Memory Bandwidth (On-Chip) | 21 PB/s |
| Wafer Size | Full 300mm wafer-scale chip |
| Core Architecture | AI-optimized programmable processing cores |
Now, as to why OpenAI chose Cerebras for compute here, there are several reasons. But one of the most important ones is how, with WSE-3, OpenAI gets insane memory bandwidth, which is crucial for memory-bounded workloads like coding. This is why, with Codex-Spark, OpenAI achieves 1000 TPS, which is claimed to be as responsive as a "human pair programmer". Spark is economically impractical for OpenAI to train on NVIDIA's infrastructure, given that Blackwell focuses more on batch processing than latency, which is why Cerebras makes sense here.

But when we talk about inference at scale, NVIDIA dominates the tokenomics, and we saw this in the company's recent talk about how it has lowered token costs by up to 10x with Blackwell. OpenAI's Sachin Katti says that with Cerebras, the company adds on "complementary capabilities", but the AI lab's loyalty in the compute race is all towards NVIDIA. However, with Codex-Spark, we can clearly see that the bottleneck today is latency, and at the hardware level, NVIDIA's tech stack isn't well-positioned to dominate this area.
It would be interesting to see how the inference market positions NVIDIA moving forward, given that Cerebras is just one of the formidable rivals in this segment, alongside emerging solutions from ASIC manufacturers and competitors like AMD.
Follow Wccftech on Google to get more of our news coverage in your feeds.





