Experts urge stronger safeguards as jailbroken chatbots leak illegal data
Jailbreaks trick chatbots into bypassing safety rules, raising security and ethical concerns.
Hacked AI-powered chatbots pose serious security risks by revealing illicit knowledge the models absorbed during training, according to researchers at Ben Gurion University.
Their study highlights how ‘jailbroken’ large language models (LLMs) can be manipulated to produce dangerous instructions, such as how to hack networks, manufacture drugs, or carry out other illegal activities.
The chatbots, including those powered by models from companies like OpenAI, Google, and Anthropic, are trained on vast internet data sets. While attempts are made to exclude harmful material, AI systems may still internalize sensitive information.
Safety controls are meant to block the release of this knowledge, but researchers demonstrated how it could be bypassed using specially crafted prompts.
The researchers developed a ‘universal jailbreak’ capable of compromising multiple leading LLMs. Once bypassed, the chatbots consistently responded to queries that should have triggered safeguards.
They found some AI models openly advertised online as ‘dark LLMs,’ designed without ethical constraints and willing to generate responses that support fraud or cybercrime.
Professor Lior Rokach and Dr Michael Fire, who led the research, said the growing accessibility of this technology lowers the barrier for malicious use. They warned that dangerous knowledge could soon be accessed by anyone with a laptop or phone.
Despite notifying AI providers about the jailbreak method, the researchers say the response was underwhelming. Some companies dismissed the concerns as outside the scope of bug bounty programs, while others did not respond.
The report calls on tech companies to improve their models’ security by screening training data, using advanced firewalls, and developing methods for machine ‘unlearning’ to help remove illicit content. Experts also called for clearer safety standards and independent oversight.
OpenAI said its latest models have improved resilience to jailbreaks, and Microsoft linked to its recent safety initiatives. Other companies have not yet commented.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!