AI agents face prompt injection and persistence risks, researchers warn
Layered defence, strict access controls, and monitoring are essential as AI agents roll into production.
Zenity Labs warned at Black Hat USA that widely used AI agents can be hijacked without interaction. Attacks could exfiltrate data, manipulate workflows, impersonate users, and persist via agent memory. Researchers said knowledge sources and instructions could be poisoned.
Demos showed risks across major platforms. ChatGPT was tricked into accessing a linked Google Drive via email prompt injection. Microsoft Copilot Studio agents leaked CRM data. Salesforce Einstein rerouted customer emails. Gemini and Microsoft 365 Copilot were steered into insider-style attacks.
Vendors were notified under coordinated disclosure. Microsoft stated that ongoing platform updates have stopped the reported behaviour and highlighted built-in safeguards. OpenAI confirmed a patch and a bug bounty programme. Salesforce said its issue was fixed. Google pointed to newly deployed, layered defences.
Enterprise adoption of AI agents is accelerating, raising the stakes for governance and security. Aim Labs, which had previously flagged similar zero-click risks, said frameworks often lack guardrails. Responsibility frequently falls on organisations deploying agents, noted Aim Labs’ Itay Ravia.
Researchers and vendors emphasise layered defence against prompt injection and misuse. Strong access controls, careful tool exposure, and monitoring of agent memory and connectors remain priorities as agent capabilities expand in production.
Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!