NVIDIA’s RTX GPUs Deliver Fastest AI Performance On OpenAI’s Latest “gpt-oss” Models

Hassan Mujtaba
High-performance gaming setup with a monitor, desktop, laptop, and graphic card.

NVIDIA & OpenAI have brought the latest gpt-oss family of AI open models to consumers, offering the highest performance on RTX GPUs.

NVIDIA's RTX 5090 Delivers 250 Tokens/s Performance on OpenAI's gpt-oss 20b AI Model, PRO GPUs Also Ready For gpt-oss 120b

Press Release: Today, NVIDIA announced its collaboration with OpenAI to bring the new gpt-oss family of open models to consumers, allowing state-of-the-art AI that was once exclusive to cloud data centers to run with incredible speed on RTX-powered PCs and workstations.

Related Story The US Makes Its First Modern Foray Into Open-Source Models With GPT-OSS, But How Does It Stack Up Against Chinese Counterparts? Hint: Pretty Close

NVIDIA founder and CEO Jensen Huang underscored the importance of this launch:

“OpenAI showed the world what could be built on NVIDIA AI — and now they’re advancing innovation in open-source software,” said Jensen Huang, founder and CEO of NVIDIA. “The gpt-oss models let developers everywhere build on that state-of-the-art open-source foundation, strengthening U.S. technology leadership in AI — all on the world’s largest AI compute infrastructure.”

The launch ushers in a new generation of faster, smarter on-device AI supercharged by the horsepower of GeForce RTX GPUs and PRO GPUs. Two new variants are available, designed to serve the entire ecosystem:

  • The gpt-oss-20b model is optimized to run at peak performance on NVIDIA RTX AI PCs with at least 16GB of VRAM, delivering up to 250 tokens per second on an RTX 5090 GPU.  
  • The larger gpt-oss-120b model is supported on professional workstations accelerated by NVIDIA RTX PRO GPUs.

Trained on NVIDIA H100 GPUs, these are the first models to support MXFP4 precision on NVIDIA RTX, a technique that increases model quality and accuracy at no incremental performance cost compared to older methods. Both models support up to 131,072 context lengths, among the longest available in local inference. They’re built on a flexible mixture-of-experts (MoE) architecture, featuring chain-of-thought capabilities and support for instruction-following and tool use.  

This week’s RTX AI Garage highlights how AI enthusiasts and developers can get started with the new OpenAI models on NVIDIA RTX GPUs:

  • Ollama App: The easiest way to test these models is with the new Ollama app. Its user interface includes out-of-the-box support for the gpt-oss models, which is fully optimized for RTX GPUs.
  • Llama.cpp: NVIDIA is collaborating with the open-source community to optimize performance on RTX GPUs, with recent contributions including CUDA Graphs to reduce overhead. Developers can get started at the Llama.cpp GitHub repository.
  • Microsoft AI Foundry: Windows developers can access the models via Microsoft AI Foundry Local (in public preview). Getting started is as simple as running the command Foundry model run gpt-oss-20b in a terminal.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Button