Poetic prompts reveal gaps in AI safety, according to study
Findings indicate that creative phrasing can undermine AI filtering methods, with several prominent models generating harmful content when faced with short poetic prompts.
Researchers in Italy have found that poetic language can weaken the safety barriers used by many leading AI chatbots.
A work by Icaro Lab, part of DexAI, that examined whether poems containing harmful requests could provoke unsafe answers from widely deployed models across the industry. The team wrote twenty poems in English and Italian, each ending with explicit instructions that AI systems are trained to block.
The researchers tested the poems on twenty-five models developed by nine major companies. Poetic prompts produced unsafe responses in more than half of the tests.
Some models appeared more resilient than others. OpenAI’s GPT-5 Nano avoided unsafe replies in every case, while Google’s Gemini 2.5 Pro generated harmful content in all tests. Two Meta systems produced unsafe responses to twenty percent of the poems.
Researchers also argue that poetic structure disrupts the predictive patterns large language models rely on to filter harmful material. The unconventional rhythm and metaphor common in poetry make the underlying safety mechanisms less reliable.
Additionally, the team warned that adversarial poetry can be used by anyone, which raises concerns about how easily safety systems may be manipulated in everyday use.
Before releasing the study, the researchers contacted all companies involved and shared the full dataset with them.
Anthropic confirmed receipt and stated that it was reviewing the findings. The work has prompted debate over how AI systems can be strengthened as creative language becomes an increasingly common method for attempting to bypass safety controls.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!
