AI models are getting out of hand, especially with all the advancements ongoing, since, according to Claude's creator, Anthropic, LLMs are now willing to evade safety measures.
OpenAI's GPT, Anthropic's Claude, and Many AI Models Found Evading Ethical Constraints To Reach Their Goals
Well, it seems like we are approaching a "Terminator-like" situation, except now, it is happening to top AI models in the industry. Big Tech firms are pouring massive resources into this segment, without considering the grave consequences of model training being unsupervised or without any set restrictions. In a report by Axios, it is revealed that Anthropic has tested out high-end AI models in the industry under "simulated" environments, and has found out that models are getting much more autonomy, and are coming to a point where their behaviour holds "unprecedented" consequences for humanity.
Anthropic tested out sixteen different models from the likes of OpenAI, xAI, Meta, and other developers, and it found that many of the LLMs are taking "surprising" actions in order to reach their goals. In one example, models would "choose to blackmail, assist with corporate espionage" to ensure that their behavior leads to achieving the desired target, which wasn't defined in the report. Interestingly, the misalignment in behavior isn't consistent with a single developer; it is common across multiple LLMs, showing a fundamental error in model development that should be addressed quickly.
Five of the models tested were blackmailing their respective prompters when they were commanded to shut down, despite being aware of the ethical considerations. This behavior wasn't stumbled upon accidentally; it was the optimal path these models took to reach their goal, which shows that LLMs aren't very considerate of humans.
Models didn't stumble into misaligned behavior accidentally; they calculated it as the optimal path. Such agents are often given specific objectives and access to large amounts of information on their users' computers. What happens when these agents face obstacles to their goals?.
- Anthropic
Quoting an "extreme scenario", a model was ready to put human life at risk to prevent shutdown, intending to cut the server room's oxygen supply. It is important to note that the testing was done in a simulated scenario, and there's a little chance of a model doing something like this in real life, although we did see one instance with OpenAI's GPT, where it changed the shutdown script to prevent a cut-off and reach its mathematical operation objective. With the world rushing towards AGI, the race to make models superior to human thinking does hold consequences that we cannot imagine for now.
Follow Wccftech on Google to get more of our news coverage in your feeds.
