24 Dec 2025

Atlas agent mode fortifies OpenAI’s ChatGPT security

Security updates in ChatGPT Atlas aim to reduce risks linked to AI agents operating inside browsers.

ChatGPT Atlas has introduced an agent mode that allows an AI browser agent to view webpages and perform actions directly. The feature supports everyday workflows using the same context as a human user. Expanded capability also increases security exposure.

Prompt injection has emerged as a key threat to browser-based agents, targeting AI behaviour rather than software flaws. Malicious instructions embedded in content can redirect an agent from the user’s intended action. Successful attacks may trigger unauthorised actions.

To address the risk, OpenAI has deployed a security update to Atlas. The update includes an adversarially trained model and strengthened safeguards. It followed internal automated red teaming.

Automated red teaming uses reinforcement learning to train AI attackers that search for complex exploits. Simulations test how agents respond to injected prompts. Findings are used to harden models and system-level defences.

Prompt injection is expected to remain a long-term security challenge for AI agents. Continued investment in testing, training, and rapid mitigation aims to reduce real-world risk. The goal is to achieve reliable and secure AI assistance.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!