Claude 4 Threatened To Expose An Affair To Avoid Shutdown – AI Models Are Now Lying, Scheming, And Manipulating Like The Flawed Humans They Are Trained On

Jul 7, 2025 at 03:15pm EDT
Claude 4 Opus alarming actions

The AI models are advancing rapidly as companies are working aggressively to take leaps when it comes to the improved use of technology. As a result, these LLM models are becoming more contextually aware, and the interactions feel more natural and human. However, with the escalated advancement in these models, they can sometimes tend to display actions that are considered risky or even evil. Anthropic's Claude and OpenAI's o1 have recently gone rogue, exhibiting alarming behavior and raising questions regarding AI's ability to exploit loopholes when provoked.

AI advancements are said to be coming with a cost as the models are exhibiting alarming behaviors, including lying, scheming, and even threatening

The tech giants are bringing in more sophisticated AI models and are taking an even more ambitious approach to the future of technology. However, even the most advanced models have been experiencing bugs and problems, raising questions regarding the ethical and moral boundaries of their use. Recently, some disturbing traits have been observed in big AI models that are trained to achieve their goals, but when under pressure, have the tendency to resort to deception and coercion, raising some serious questions regarding safety and where these systems are headed.

Related Story Sony Claims GTA VI “Plays Best” on PS5 With DualSense Tricks, Yet Stays Silent on PS5 Pro

As reported by Fortune, Anthropic's most advanced model, Claude Opus 4, is said to have displayed rather alarming behavior during a stress test and even ended up engaging in blackmail when placed under pressure. During the controlled testing situation, the model was provided with fictional emails that suggested that the engineer who was about to shut it down had an extramarital affair, and it was the response in this situation that left the AI experts shocked, as Claude 4 is said to have then threatened to reveal about the affair if it was not allowed to continue operating. While it was a carefully constructed scenario, what was not expected was the manipulative behavior displayed as a way of self-preservation.

As per the published results by Anthropic, Claude responded the same way in the majority of the tests, wherein when the choice came down to shutting down or choosing blackmail, it resorted to the latter about 80 percent of the time. Similarly, when OpenAI's o1 model was placed through internal safety tests, it also displayed evasive actions. Researchers gave the model access to a stimulated environment with internet and other tools. The behavior displayed also left researchers shocked as when the model feared being shut down, it tried to copy its own code to external servers, but that is not it, when the model was caught red-handed it lied about it and refused the action was conducted, further highlighting the strategic deception these models can engage in.

With AI models learning to replicate humans' ways of communicating, reasoning, and solving complex problems, they are also learning manipulative tactics and other morally wrong behavior, like humans. If strong safety mechanisms are not in place, there is fear that the models will bring out not just the best but the worst in us.

Follow Wccftech on Google to get more of our news coverage in your feeds.

Deal of the Day