Study warns that LLMs are vulnerable to minimal tampering
Security experts warn that deceptive poisoned data could cause more dangerous real-world failures.
Researchers from Anthropic, the UK AI Security Institute and the Alan Turing Institute have shown that only a few hundred crafted samples can poison LLM models. The tests revealed that around 250 malicious entries could embed a backdoor that triggers gibberish responses when a specific phrase appears.
Models ranging from 600 million to 13 billion parameters (such as Pythia) were affected, highlighting the scale-independent nature of the weakness. A planted phrase such as ‘sudo’ caused output collapse, raising concerns about targeted disruption and the ease of manipulating widely trained systems.
Security specialists note that denial-of-service effects are worrying, yet deceptive outputs pose far greater risk. Prior studies already demonstrated that medical and safety-critical models can be destabilised by tiny quantities of misleading data, heightening the urgency for robust dataset controls.
Researchers warn that open ecosystems and scraped corpora make silent data poisoning increasingly feasible. Developers are urged to adopt stronger provenance checks and continuous auditing, as reliance on LLMs continues to expand for AI purposes across technical and everyday applications.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!
