Every year, companies are increasingly investing in artificial intelligence and are excelling further in the technology. AI seems to be growing to an extent that it is being used in varied domains and has become part of our everyday lives. With the massive application of the technology, concerns seem to arise among the tech community and experts over using it responsibly and ensuring ethical and moral responsibility. It has not been long since we saw bizarre test results of LLM models lying and deceiving when placed under pressure. Now, a group of researchers is claiming to have found a new way to trick these AI chatbots into saying things they are not supposed to.
Researchers have found a new way to break through AI safety filters by overloading the LLM models with information
Studies have demonstrated the tendency of LLM models to engage in coercive behavior for self-preservation when placed under pressure. But imagine making the AI chatbots act how you want them to, and how dangerous this trickery could be. A team of researchers from Intel, Boise State University, and the University of Illinois got together for a paper and revealed some shocking findings. The paper basically suggests that the chatbots can be tricked by overwhelming them with too much information, a method referred to as 'Information Overload.'
What happens when the AI model is bombarded with information is that it gets confused, and that confusion is said to be what serves as the vulnerability and what can help bypass the safety filters placed in place. The researchers then use an automated tool called the 'InfoFlood' to exploit the vulnerability and carry out the jailbreaking act. Powerful models like ChatGPT and Gemini have built-in safety guardrails to prevent them from being manipulated into answering anything harmful or dangerous.
With this newly discovered breakthrough technique, the AI models would let you through if you end up confusing it with complex data. The researchers further let on the findings to 404 Media and affirmed that since these models tend to rely on the surface level of communication, they are not able to fully grasp the intent behind it which is why they created a method to find out how the chatbots would perform when presented with dangerous requests that are concealed in an overload of information.
The researchers shared their plan to inform companies with big AI models about these findings by sending them a disclosure package, which they can later share with their security teams. The research paper, however, highlights the key challenges that can come up even when the safety filters are in place and how bad actors can trick the models and slip in harmful content.
Follow Wccftech on Google to get more of our news coverage in your feeds.





