23 Feb 2026

EVMbench from OpenAI, Paradigm and OtterSec measures AI smart contract risks

EVMbench reveals how frontier AI agents detect and exploit real smart contract vulnerabilities, raising new blockchain security concerns.

OpenAI, with Paradigm and OtterSec, introduced EVMbench to test how AI agents detect, patch, and exploit smart contract flaws. The benchmark draws on 120 real vulnerabilities from 40 blockchain projects to better reflect live conditions.

Researchers report that leading agents can now discover and exploit end-to-end vulnerabilities in live blockchain instances. Over six months, exploit success rates rose sharply, prompting both praise for improved auditing capabilities and concern over the rapid scaling of offensive skills.

EVMbench evaluates agents across three modes: detect, patch, and exploit. Each stage reflects increasing technical complexity and mirrors the responsibilities faced in production blockchain environments, where contracts are often immutable, and errors can lead to irreversible losses.

Recent incidents underline the stakes. A vulnerability in AI-generated Solidity code reportedly mispriced an asset, triggering liquidations and losses. Such cases highlight the risks of deploying AI-written financial logic without rigorous human review and governance safeguards.

While EVMbench advances measurement of AI capabilities, it remains limited to curated vulnerabilities and sandboxed conditions. As blockchain adoption expands and criminal misuse evolves, researchers stress the need for responsible AI development alongside stronger innovative contract security practices.

Would you like to learn more about AI, tech, and digital diplomacy? If so, ask our Diplo chatbot!