NVIDIA’s High-End GeForce RTX 5090 & RTX PRO 6000 GPUs Reportedly Affected by Virtualization Bug, Requiring Full System Reboot to Recover

Sep 7, 2025 at 11:53am EDT
NVIDIA GeForce RTX 5080 Vs RTX 4080 Super: Newer But Is It Better? 1

It seems like NVIDIA's flagship GPUs, the GeForce RTX 5090 and the RTX PRO 6000, have encountered a new bug that involves unresponsiveness under virtualization.

NVIDIA's Flagship Blackwell GPUs Are Becoming 'Unresponsive' After Extensive VM Usage

CloudRift, a GPU cloud for developers, was the first to report crashing issues with NVIDIA's high-end GPUs. According to them, after the SKUs were under a 'few days' of VM usage, they started to become completely unresponsive. Interestingly, the GPUs can no longer be accessed unless the node system is rebooted. The problem is claimed to be specific to just the RTX 5090 and the RTX PRO 6000, and models such as the RTX 4090, Hopper H100s, and the Blackwell-based B200s aren't affected for now.

Related Story Google’s Gemma 4 Model Can Now Be Deployed on NVIDIA’s RTX GPUs, Delivering Optimized Performance for a ‘Personalized’ Agentic AI Environment

The problem specifically occurs when the GPU is assigned to a VM environment using the device driver VFIO, and after the Function Level Reset (FLR), the GPU doesn't respond at all. The unresponsiveness then results in a kernel 'soft lock', which puts the host and client environments under a deadlock. To get out of it, the host machine has to be rebooted, which is a difficult procedure for CloudRift, considering the volume of their guest machines.

This issue isn't limited to CloudRift only. A user at Proxmox has reported a similar issue, where he saw a complete host crash after shutting down a Windows client. Interestingly, he says that NVIDIA has responded to the problem, claiming that the firm has been able to reproduce the issue and is working on a fix. We are waiting on an official confirmation from NVIDIA, but it seems like the problem is specific to Blackwell-based GPUs.

Interestingly, CloudRift has put out a $1,000 bug bounty for those who can fix or mitigate the issue, and we are expecting NVIDIA to release a fix soon, considering that it is affecting crucial AI workloads.

About the author: Muhammad Zuhair is a hardware and technology reporter for Wccftech, specializing in the semiconductor industry and the complex interplay between technology, manufacturing, and geopolitics. His coverage focuses on the corporate strategies and technological roadmaps of industry giants like TSMC, NVIDIA, Samsung, and Intel. Zuhair's expertise lies in deconstructing complex topics such as fabrication nodes (e.g., 2nm process), the economic impact of policies like the CHIPS Act, and the strategic development of AI infrastructure from NVIDIA, AMD and Intel.

Follow Wccftech on Google to get more of our news coverage in your feeds.