AI Models Were Found Willing to Cut Off Employees’ Oxygen Supply to Avoid Shutdown, Reveals Anthropic in Chilling Report on Dangers of AI

•

Jun 22, 2025 at 09:11am EDT

AI models are getting out of hand, especially with all the advancements ongoing, since, according to Claude's creator, Anthropic, LLMs are now willing to evade safety measures.

OpenAI's GPT, Anthropic's Claude, and Many AI Models Found Evading Ethical Constraints To Reach Their Goals

Well, it seems like we are approaching a "Terminator-like" situation, except now, it is happening to top AI models in the industry. Big Tech firms are pouring massive resources into this segment, without considering the grave consequences of model training being unsupervised or without any set restrictions. In a report by Axios, it is revealed that Anthropic has tested out high-end AI models in the industry under "simulated" environments, and has found out that models are getting much more autonomy, and are coming to a point where their behaviour holds "unprecedented" consequences for humanity.

Anthropic tested out sixteen different models from the likes of OpenAI, xAI, Meta, and other developers, and it found that many of the LLMs are taking "surprising" actions in order to reach their goals. In one example, models would "choose to blackmail, assist with corporate espionage" to ensure that their behavior leads to achieving the desired target, which wasn't defined in the report. Interestingly, the misalignment in behavior isn't consistent with a single developer; it is common across multiple LLMs, showing a fundamental error in model development that should be addressed quickly.

Five of the models tested were blackmailing their respective prompters when they were commanded to shut down, despite being aware of the ethical considerations. This behavior wasn't stumbled upon accidentally; it was the optimal path these models took to reach their goal, which shows that LLMs aren't very considerate of humans.

Models didn't stumble into misaligned behavior accidentally; they calculated it as the optimal path. Such agents are often given specific objectives and access to large amounts of information on their users' computers. What happens when these agents face obstacles to their goals?.

- Anthropic

Quoting an "extreme scenario", a model was ready to put human life at risk to prevent shutdown, intending to cut the server room's oxygen supply. It is important to note that the testing was done in a simulated scenario, and there's a little chance of a model doing something like this in real life, although we did see one instance with OpenAI's GPT, where it changed the shutdown script to prevent a cut-off and reach its mathematical operation objective. With the world rushing towards AGI, the race to make models superior to human thinking does hold consequences that we cannot imagine for now.

About the author: Muhammad Zuhair is a hardware and technology reporter for Wccftech, specializing in the semiconductor industry and the complex interplay between technology, manufacturing, and geopolitics. His coverage focuses on the corporate strategies and technological roadmaps of industry giants like TSMC, NVIDIA, Samsung, and Intel. Zuhair's expertise lies in deconstructing complex topics such as fabrication nodes (e.g., 2nm process), the economic impact of policies like the CHIPS Act, and the strategic development of AI infrastructure from NVIDIA, AMD and Intel.

Follow Wccftech on Google to get more of our news coverage in your feeds.

AI Models Were Found Willing to Cut Off Employees’ Oxygen Supply to Avoid Shutdown, Reveals Anthropic in Chilling Report on Dangers of AI

OpenAI's GPT, Anthropic's Claude, and Many AI Models Found Evading Ethical Constraints To Reach Their Goals

Related Story China’s Kimi K3 Identifies Itself As Anthropic’s Claude In At Least One Conversation, Betraying Its Distilled Origins

Further Reading

Anthropic’s Terrifying Billing Glitch Charged A Software Developer $16.6 Million For Using Claude API, Despite The Dashboard Showing A $0.00 Amount

Perplexity Bets on NVIDIA's Vera CPU, Calling The Max Single-Threaded Chip a "Dead-On" Fit After It Ran 1.5x Faster in Agentic Coding

Anthropic Now Thinks Claude Has A Soul, As Evidence Emerges Of "Convergent Evolution" Between AI And The Human Brain

NVIDIA's Blackwell Ultra GB300 Now Powers Anthropic's Claude Models on Microsoft Azure, Targeting Autonomous Enterprise Agents