New NIST study reveals inherent weaknesses in AI defences
Findings indicate that organisations deploying AI should prepare for continuous security management instead of relying on one-time safeguards.
A new study by a researcher at the US National Institute of Standards and Technology suggests that fixed AI guardrails cannot provide complete protection against adaptive adversarial prompts.
The paper, published in IEEE Security & Privacy by NIST senior scientist Apostol Vassilev, uses logic linked to Kurt Gödel’s incompleteness theorems to argue that a finite set of AI safety rules cannot be universally robust against every possible prompt-based attack.
According to NIST, the finding does not mean AI systems cannot be hardened. Instead, it supports moving away from a ‘one and done’ security model towards continuous monitoring, testing and updating.
The recommended approach includes ongoing red-team work to identify adversarial prompts before attackers exploit them, continuous updates to strengthen guardrails and operational resilience measures that limit the impact of successful attacks and enable quick recovery.
NIST said the goal is not to eliminate all vulnerabilities, but to make exploitation more difficult and costly. As AI systems are deployed more widely, organisations should treat AI security as a permanent operational process rather than a problem that can be solved through a fixed set of controls.
Why does it matter?
The study reinforces a central challenge in AI governance: security controls for AI systems cannot be treated as static compliance measures. As AI tools are integrated into business operations, public services and security-sensitive environments, organisations may need continuous red-teaming, guardrail updates, monitoring and incident response. The policy relevance lies in shifting AI risk management from one-time assurance towards ongoing operational resilience.
Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our chatbot!
