NVIDIA’s RTX GPUs Deliver Fastest AI Performance On OpenAI’s Latest “gpt-oss” Models

Hassan Mujtaba • Aug 5, 2025 at 02:26pm EDT

High-performance gaming setup with a monitor, desktop, laptop, and graphic card.

NVIDIA & OpenAI have brought the latest gpt-oss family of AI open models to consumers, offering the highest performance on RTX GPUs.

NVIDIA's RTX 5090 Delivers 250 Tokens/s Performance on OpenAI's gpt-oss 20b AI Model, PRO GPUs Also Ready For gpt-oss 120b

Press Release: Today, NVIDIA announced its collaboration with OpenAI to bring the new gpt-oss family of open models to consumers, allowing state-of-the-art AI that was once exclusive to cloud data centers to run with incredible speed on RTX-powered PCs and workstations.

NVIDIA founder and CEO Jensen Huang underscored the importance of this launch:

“OpenAI showed the world what could be built on NVIDIA AI — and now they’re advancing innovation in open-source software,” said Jensen Huang, founder and CEO of NVIDIA. “The gpt-oss models let developers everywhere build on that state-of-the-art open-source foundation, strengthening U.S. technology leadership in AI — all on the world’s largest AI compute infrastructure.”

The launch ushers in a new generation of faster, smarter on-device AI supercharged by the horsepower of GeForce RTX GPUs and PRO GPUs. Two new variants are available, designed to serve the entire ecosystem:

The gpt-oss-20b model is optimized to run at peak performance on NVIDIA RTX AI PCs with at least 16GB of VRAM, delivering up to 250 tokens per second on an RTX 5090 GPU.
The larger gpt-oss-120b model is supported on professional workstations accelerated by NVIDIA RTX PRO GPUs.

Trained on NVIDIA H100 GPUs, these are the first models to support MXFP4 precision on NVIDIA RTX, a technique that increases model quality and accuracy at no incremental performance cost compared to older methods. Both models support up to 131,072 context lengths, among the longest available in local inference. They’re built on a flexible mixture-of-experts (MoE) architecture, featuring chain-of-thought capabilities and support for instruction-following and tool use.

This week’s RTX AI Garage highlights how AI enthusiasts and developers can get started with the new OpenAI models on NVIDIA RTX GPUs:

Ollama App: The easiest way to test these models is with the new Ollama app. Its user interface includes out-of-the-box support for the gpt-oss models, which is fully optimized for RTX GPUs.
Llama.cpp: NVIDIA is collaborating with the open-source community to optimize performance on RTX GPUs, with recent contributions including CUDA Graphs to reduce overhead. Developers can get started at the Llama.cpp GitHub repository.
Microsoft AI Foundry: Windows developers can access the models via Microsoft AI Foundry Local (in public preview). Getting started is as simple as running the command Foundry model run gpt-oss-20b in a terminal.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Read all comments on NVIDIA’s RTX GPUs Deliver Fastest AI Performance On OpenAI’s Latest “gpt-oss” Models

NVIDIA’s RTX GPUs Deliver Fastest AI Performance On OpenAI’s Latest “gpt-oss” Models

NVIDIA's RTX 5090 Delivers 250 Tokens/s Performance on OpenAI's gpt-oss 20b AI Model, PRO GPUs Also Ready For gpt-oss 120b

Trending Stories

Samsung Achieves Technological Milestone With A New Kind Of NAND Flash Storage That Consumes 96% Less Power, Which Is Exactly What Smartphones Need

Samsung Will Reportedly Debut Several High-End QD-OLED Monitors And LG WOLED-Based Dual Mode Monitor Next Year

Dying Light: The Beast Update 1.4 Adds Ray Tracing on PC, New Game+, Legend Levels, and Many Improvements

iPhone 16 Pro Max Owner Demonstrates That The A18 Pro Can Match A Binned A19 Pro’s Single-Core & Multi-Core Performance If It Is Cooled Properly

DDR5 32 GB Memory Kit Drops Below $200 At A Time When RAM Is Costing More Than Consoles

Popular Discussions

Radeon RX 9070 XT Outsells The Entire NVIDIA RTX 50 Series On Popular German Retailer

AMD To Raise GPU Prices By At Least 10%, Notifies Partners Including ASUS, Gigabyte, & PowerColor

Intel CEO Lip-Bu Tan Says the Company “Deserves Better,” Admitting Team Blue Missed Major Opportunities Due to Complacency

Intel Arc B390 Reportedly Delivers 7,000 Points In Time Spy Graphics, Crushing Radeon 890M And Arc 140T With Ease

Intel Nova Lake-S CPU Lineup Include Four bLLC Models: 48 Core, 40 Core, 24 Core, 20 Core With 288 MB Dual & 144 MB Caches To Tackle Ryzen 3D V-Cache

NVIDIA’s RTX GPUs Deliver Fastest AI Performance On OpenAI’s Latest “gpt-oss” Models

NVIDIA's RTX 5090 Delivers 250 Tokens/s Performance on OpenAI's gpt-oss 20b AI Model, PRO GPUs Also Ready For gpt-oss 120b

Related Story The US Makes Its First Modern Foray Into Open-Source Models With GPT-OSS, But How Does It Stack Up Against Chinese Counterparts? Hint: Pretty Close

Further Reading

Trending Stories

Popular Discussions