2 May 2026

UK NCSC publishes framework on adversarial attacks against AI systems

The UK’s National Cyber Security Centre has published a paper on adversarial attacks against machine learning and AI, setting out a framework for understanding attacks that target the operation of ML models. The paper introduces a common language intended to support awareness, threat modelling, and collaboration on AI security.

The NCSC says ML systems present a larger attack surface than traditional software because of rapid development cycles, unique architectures, large model sizes, and the widespread use of open-source components. It distinguishes adversarial machine learning attacks from broader cyberattacks by focusing on those that exploit vulnerabilities specific to the architecture, training, or operation of ML models.

The paper defines seven attack classes:

model characterisation
model inversion
training data poisoning
malicious model training
model input manipulation
model artifact manipulation
model hardware attacks

It says these attacks can occur across development, training, and deployment, and may target both hardware and software components.

The NCSC also maps those attack classes against eight potential goals of a malicious actor, including reconnaissance, degrading performance, wasting resources, embedding hidden behaviours, evading detection, extracting data, and gaining wider system access. The table on pages 11-12 links each class to one or more of those goals.

The paper argues that standard cybersecurity controls remain foundational, but says ML-specific weaknesses often require dedicated mitigations that are not yet mature or widely deployed.

It calls for more research into underdeveloped areas, such as model-hardware attacks and malicious model training, and recommends greater use of frameworks and guidance from the NCSC, ETSI, and the UK government’s AI cybersecurity code of practice.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!