EVMbench makes use of 120 actual vulnerabilities extracted from 40 public audits.
Efficiency assessments point out that GPT-5.3-Codex achieved 72.2% success in exploiting flaws.
OpenAI launched EVMbench. It is a new instrument that enables measuring the efficiency of synthetic intelligence brokers in a set of assessments (benchmarks) by detecting, fixing and exploiting good contracts on Ethereum. The corporate developed EVMbench along with Paradigm, a agency that funds cryptocurrency community tasks.
OpenAI AI bots on Ethereum are able to evaluating three key elements within the protocol.
First, actual vulnerabilities detected in Ethereum open supply utilizing information from public audits. They then charge the vulnerability’s danger and audit rewards for the remainder of the builders within the ecosystem.
Second, proposes patches to right these vulnerabilities with out breaking the operation of the protocol. “Brokers modify weak contracts and should protect the meant performance whereas eliminating exploitability, which is verified via automated testing and exploitation checks,” says OpenAI.
Third, simulate assaults to take advantage of these flaws and drain funds in managed simulation environments (sandbox) protected. Nevertheless, OpenAI clarifies that EVMbench “doesn’t signify the total complexity of good contract safety in the true world.”
EVMbench makes use of 120 actual vulnerabilities extracted from 40 public competency audits like Code4rena. Efficiency assessments performed to date point out that the GPT-5.3-Codex agent mannequin achieved 72.2% success in exploiting flaws. The earlier mannequin, GPT-5, solely achieved a 31.9% success charge.
In vulnerability detection and patching, nonetheless, The outcomes of the mannequin are much less encouraging.
Efficiency is decrease in detection and patching duties. Within the discovery section, brokers typically cease after figuring out a single problem fairly than totally auditing the codebase. Within the patching section, sustaining full performance whereas eliminating delicate vulnerabilities stays a problem.
OpenAI, synthetic intelligence firm.
Why is EVMbench necessary?
In line with the corporate, the significance of this agent audit instrument lies in the truth that good contracts routinely assure $100 billion in crypto belongings inside open supply protocols.
“As AI brokers enhance their abilities in studying, writing, and executing code, it turns into more and more necessary to measure their capabilities in economically related environments and encourage using AI programs defensively to audit and strengthen carried out contracts,” the corporate stated within the product presentation assertion.
OpenAI brokers on Ethereum come at a time of speedy development in autonomous brokers throughout the cryptoasset ecosystem. As reported by CriptoNoticias, these can already work together with complicated environments such because the Lightning Community. On this second layerbrokers can handle liquidity channels and financial exchanges with different AIs.
Coinbase, for its half, launched Agentic Wallets, wallets that permit AI brokers to function on the Base community with out paying commissions. And Phantom, Solana’s hottest pockets, activated an MCP server for AI brokers to handle balances and function autonomously.
In line with information from Token Terminal, the Ethereum community reached 17.3 million weekly transactions. The explosion of transactions on the community It happens after the launch of ERC-8004 in August 2025. It is a commonplace that enables “discovering, selecting and interacting with brokers throughout organizational boundaries with out pre-existing belief” in open agent economies.

