OpenAI outlines safeguards as AI cyber capabilities advance

Defensive tooling is a priority as OpenAI expands advanced AI systems.

OpenAI is preparing for models that reach high cybersecurity capability.

Cyber capabilities in advanced AI models are improving rapidly, delivering clear benefits for cyberdefence while introducing new dual-use risks that require careful management, according to OpenAI’s latest assessment.

The company points to sharp gains in capture-the-flag performance, with success rates rising from 27 percent in August to 76 percent by November 2025. OpenAI says future models could reach high cyber capability, including assistance with sophisticated intrusion techniques.

To address this, OpenAI says it is prioritising defensive use cases, investing in tools that help security teams audit code, patch vulnerabilities, and respond more effectively to threats. The goal is to give defenders an advantage in an often under-resourced environment.

OpenAI argues that cybersecurity cannot be governed through a single safeguard, as defensive and offensive techniques overlap. Instead, it applies a defence-in-depth approach that combines access controls, monitoring, detection systems, and extensive red teaming to limit misuse.

Alongside these measures, the company plans new initiatives, including trusted access programmes for defenders, agent-based security tools in private testing, and the creation of a Frontier Risk Council. OpenAI says these efforts reflect a long-term commitment to cyber resilience.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!