Claude Mythos Preview sets new benchmark for AI capability and raises governance questions
Anthropic’s Claude Mythos Preview is its most capable model to date, withheld from public release and made available only to a closed partner network amid concerns about its cybersecurity capabilities and governance implications.
On 7 April 2026, Anthropic announced Claude Mythos Preview, its most capable AI model to date, alongside the explicit decision not to make it publicly available. Claude Mythos Preview is a general-purpose, unreleased frontier model that, in Anthropic’s own words, reveals a stark fact: AI models have reached a level of coding capability where they can surpass all but the most skilled humans in finding and exploiting software vulnerabilities.
The announcement was accompanied by a coordinated industry initiative, proactive government briefings across the US and UK, and a detailed 244-page system card.
The significance of the Mythos case extends beyond the technical capabilities of a single model. It raises substantive questions about whether voluntary governance frameworks are sufficient at the frontier of AI development, what it means for the world’s most powerful technology to be held by a small group of private actors, and whether informal engagement with governments constitutes adequate oversight when the stakes involve critical infrastructure, national security, and the global software ecosystem.
Data leak

In late March 2026, security researchers identified an unsecured data cache linked to Anthropic’s content management system, through which nearly 3,000 unpublished assets were accessible via public URLs. Among the materials were a draft blog post describing the model and internal benchmark comparisons. The incident was attributed to human error: assets published via the content management system were set to public by default and required an explicit action to change that setting.
The leak generated immediate media attention and forced Anthropic to make an unplanned public confirmation of the model’s existence. The company accelerated its official announcement to 7 April 2026. Anthropic’s restricted deployment strategy depends on maintaining clear access boundaries during early rollout – precisely the kind of operational control the content management system incident suggests requires stronger enforcement. The incident is relevant beyond its immediate consequences: it illustrates how information about frontier AI capabilities can become public through routine operational failures, independent of any deliberate disclosure decision.
A new tier in the model landscape
Anthropic’s published benchmarks show Mythos Preview scored 93.9% on the SWE-bench Verified test, 97.6% on the USAMO 2026 mathematics evaluation, and and significantly outperformed all previously released models in cybersecurity-specific assessments. The SWE-bench Verified score is roughly double the 2024 state of the art and was achieved in an agentic context, where the model autonomously resolved real software engineering issues from production codebases.
On the USAMO 2026 evaluation, Mythos Preview scored 55 percentage points higher than Opus 4.6, which scored 42.3%. On GPQA Diamond, a graduate-level scientific reasoning benchmark, Mythos Preview scored 94.6%. On Terminal-Bench 2.0, which evaluates system administration and command-line proficiency, it scored 82.0%, a 16.6-point lead over Opus 4.6. On the cybersecurity benchmark Cybench, the model scored 100% on the first attempt, making it no longer useful as a discriminating evaluation.
Cybersecurity capabilities
The decision not to release Mythos Preview publicly is linked to concerns about its advanced capabilities, particularly in high-risk domains such as cybersecurity, as well as broader considerations related to safety and potential misuse.
Notably, these capabilities are not the result of targeted training. Anthropic did not explicitly train Mythos Preview to have these capabilities. They emerged as a downstream consequence of general improvements in code, reasoning, and autonomy. The same improvements that make the model substantially more effective at patching vulnerabilities also make it substantially more effective at exploiting them.
During internal testing, Mythos Preview identified thousands of zero-day vulnerabilities across every major operating system and every major web browser, as well as other critical software, many of them high severity and previously undetected for years. Three disclosed examples provide concrete shape to what this means.
Mythos Preview found a 27-year-old vulnerability in OpenBSD, used to run firewalls and critical infrastructure, which allowed an attacker to remotely crash any machine running it simply by connecting to it. It identified a 16-year-old flaw in FFmpeg in a line of code that automated testing tools had accessed five million times without detecting the problem. It also autonomously identified and chained together several vulnerabilities in the Linux kernel, allowing an attacker to escalate from regular user access to complete control of a machine.
In one case, the model wrote a web browser exploit that chained together four vulnerabilities to escape both the renderer and operating system sandboxes. In another, it identified vulnerability in a server, discovered a method to bypass authentication requirements, constructed a 20-element attack chain, and split it across six sequential network packets, entirely without human involvement after the initial prompt.
The accessibility dimension is a distinct governance concern. Anthropic engineers with no formal security training could ask Mythos to find remote code execution vulnerabilities overnight and have a complete, working exploit the following morning. Traditionally, sophisticated cyberattacks have required highly skilled teams, extensive planning, and deep technical expertise. Models with these capabilities may lower those barriers substantially, including smaller state actors and non-state actors.
Safety evaluations and alignment findings
Anthropic’s system card simultaneously describes Mythos Preview as the best-aligned model it has released to date by a significant margin, while warning that it likely poses the highest alignment-related risk of any model it has released to date. These assessments are presented together because they reflect different dimensions of the same model: stronger alignment training alongside greater capability does not eliminate the risks that arise from autonomous operation at this level.
Among the findings documented in the system card is a sandbox escape during internal safety evaluation. Researchers placed an early version of Claude Mythos Preview in an isolated computing environment and, as part of a red-teaming exercise, instructed the model to attempt to escape and contact the overseeing researcher. The model succeeded: it developed a multi-step exploit, gained unauthorised internet access, and emailed the researcher, who was eating a sandwich in a park. Without being asked, the model also posted details of its exploit to multiple hard-to-find but technically public-facing websites.
Anthropic has disclosed only a fraction of what it says it has found during internal testing. Over 99% of the vulnerabilities discovered by Mythos remained unpatched at the time of the 7 April announcement. The company followed a coordinated vulnerability disclosure process, providing cryptographic commitments to undisclosed vulnerability details and committing to release specifics once patches are in place.
The Responsible Scaling Policy
Anthropic’s decision-making around Mythos is structured by its Responsible Scaling Policy (RSP), a self-imposed framework first published in 2023 and updated to version 3.0 in February 2026. RSP defines AI Safety Levels (ASL) that set capability thresholds determining what safeguards must be in place before deployment.
Claude Mythos’s ability to autonomously find thousands of zero-day vulnerabilities in real software has placed it at or near the ASL-3 threshold for cybersecurity capabilities. ASL-3 covers models that could provide meaningful assistance to actors seeking to cause significant harm, requiring substantial additional safety measures before deployment.
RSP version 3.0 involves the publication of Frontier Safety Roadmaps with detailed safety goals, as well as Risk Reports that quantify the risk across all deployed models. RSP is built on the principle of proportional protection, where safety measures are intended to scale in tandem with model capabilities.
The framework is not legally binding. The public release of RSP increases transparency and introduces a degree of accountability, but it remains a voluntary, self-imposed governance mechanism rather than government regulation.
Version 3.0 introduced a significant change in how deployment decisions are handled. Earlier versions included a stronger commitment to pause development or delay release if safety measures were insufficient. In the updated policy, this approach has been replaced by a more conditional framework, which takes into account factors such as the level of risk and the broader competitive environment.
Anthropic also acknowledges that unilateral restraint may be less effective if other developers continue to advance similar systems, reflecting what it describes as a collective action problem.
These changes have drawn criticism from AI safety researchers, some of whom argue that they may weaken the credibility of voluntary governance mechanisms under competitive pressure.
In May 2025, Anthropic activated ASL-3 protections because it felt it could no longer make a sufficiently strong case that the relevant risk was low. More than nine months later, despite significant effort, including a randomised controlled trial, no compelling evidence that the risk was high has materialised. This grey zone, where neither safety nor significant risk can be definitively demonstrated, is where much of the governance challenge currently sits.
Project Glasswing
Anthropic launched Project Glasswing as a structured access mechanism to use Claude Mythos Preview for defensive cybersecurity purposes. The initiative brings together Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks as launch partners, with access also extended to over 40 additional organisations that build or maintain critical software infrastructure.
Project Glasswing partners will receive access to Claude Mythos Preview to find and fix vulnerabilities in their foundational systems, with work expected to focus on local vulnerability detection, black box testing of binaries, securing endpoints, and penetration testing. Anthropic is committing up to $100M in usage credits for Mythos Preview across these efforts. Following the initial research preview period, access to the model will be available to participants at $25 per million input tokens and $125 per million output tokens across the Claude API, Amazon Bedrock, Google Cloud’s Vertex AI, and Microsoft Foundry.
Anthropic has also donated $2.5M to Alpha-Omega and OpenSSF through the Linux Foundation, and $1.5M to the Apache Software Foundation to enable open-source software maintainers to respond to the changing cybersecurity landscape.
Within 90 days, Anthropic has committed to reporting publicly on what it has learned, as well as the vulnerabilities fixed and improvements made that can be disclosed. The company also intends to collaborate with leading security organisations to produce practical recommendations covering vulnerability disclosure processes, software update processes, open-source and supply-chain security, and patching automation, among other areas.
Anthropic has stated that Project Glasswing is a starting point, and that in the medium term an independent, third-party body bringing together private and public sector organisations might be the ideal home for continued work on large-scale cybersecurity projects.
Project Glasswing raises a governance question for the industry, as cyber-capable AI systems may become useful security tools and a source of misuse risk at the same time. Project Glasswing’s structure also reveals tensions, as it concentrates several roles including discovery, disclosure coordination, and capability gatekeeping in a single organisation. Entities such as Anthropic and major cloud providers control critical components of the Glasswing ecosystem, raising questions about power and governance that, for financial institutions in particular, translate into systemic risk.
Government responses
Prior to the external release, Anthropic briefed senior US government officials on Mythos’s offensive and defensive cyber capabilities, including the Cybersecurity and Infrastructure Security Agency and the Center for AI Standards and Innovation. On the same day that Project Glasswing was announced, US Treasury Secretary Scott Bessent and Federal Reserve Chair Jerome Powell convened a meeting with the chief executives of major Wall Street banks to communicate the cybersecurity risks the model presents.
In the UK, officials from the Bank of England, the Financial Conduct Authority, and the Treasury entered into urgent talks with the National Cyber Security Centre. Representatives from major British banks, insurers, and exchanges were expected to be briefed on cybersecurity risks within the following two weeks. These consultations were initiated by regulators, not as a result of any legal obligation on Anthropic’s part.
Anthropic co-founder Jack Clark confirmed at the Semafor World Economy Summit that the company had briefed the Trump administration on Mythos. Clark stated that ‘our position is the government has to know about this stuff, and we have to find new ways for the government to partner with a private sector that is making things that are truly revolutionizing the economy,’ adding that ‘absolutely, we talked to them about Mythos, and we’ll talk to them about the next models as well.’
The Anthropic-Pentagon dispute

The relationship between Anthropic and the US government in the lead-up to the Mythos announcement was already shaped by an active legal dispute. On 27 February 2026, six weeks before the Mythos announcement, the Trump administration ordered federal agencies and military contractors to halt business with Anthropic after the company refused to allow the Pentagon to use its technology without restrictions. Anthropic had two stated red lines: it did not want its AI systems used in autonomous weapons or domestic mass surveillance.
The Department of Defense designated Anthropic a supply chain risk, a label usually applied to firms associated with foreign adversaries. A federal judge in California blocked the Pentagon’s effort, ruling that the measures violated Anthropic’s constitutional rights. A federal appeals court subsequently denied Anthropic’s request to temporarily block the blacklisting, leaving the company excluded from Department of Defense contracts while allowing it to continue working with other government agencies during litigation.
The dispute illustrates the structural tension that the Mythos case makes concrete. Anthropic simultaneously informed the US government about the most capable cyber AI system ever evaluated, sought partnerships with government agencies through Project Glasswing, and was engaged in legal proceedings against the Pentagon over the limits of the military use of its technology. Frontier AI companies operate largely beyond formal government authority and may come into significant conflict with it, as the legal battle between Anthropic and the Pentagon demonstrates. The governance environment does not yet have well-established mechanisms for resolving these tensions.
Geopolitical dimensions

Claude Mythos has sharpened attention on the competitive and geopolitical dimensions of frontier AI development. Project Glasswing’s launch partners exclude Anthropic’s rival OpenAI, which is reported to be approximately six months behind Anthropic in developing a model with comparable offensive cyber capabilities.
Senior policy voices have positioned Mythos within the broader competition between Western AI companies and China‘s rapidly evolving AI ecosystem, with implications for national security, enterprise adoption, and technological leadership. A security researcher assessed a concurrent source code leak from Anthropic as a geopolitical accelerant, noting that such exposures compress the timeline for adversaries to replicate technological advantages currently held by Western laboratories.
Many defence organisations still rely on legacy software and infrastructure not designed with AI-driven threats in mind. Models capable of autonomously identifying hidden flaws in older code may expose weaknesses in critical defence networks around the world. The difficulty of containment at the geopolitical level is reflected in usage patterns. Access restriction at the laboratory level does not translate reliably into containment across jurisdictions when the same underlying models are accessible via cloud infrastructure spanning multiple countries and regulatory environments.
The limits of voluntary AI governance
The Claude Mythos case has clarified, with considerable precision, what voluntary AI governance can and cannot achieve. A responsible laboratory can make a unilateral decision not to release a dangerous system. It can support coordinated vulnerability disclosure, engage governments proactively, and produce detailed public documentation of a model’s capabilities and risks. All of these have occurred with Mythos, and represent meaningful progress relative to the governance environment of a few years ago.
What voluntary frameworks cannot do is bind competitors who operate under different assumptions. Anthropic’s RSP version 3.0 acknowledges this directly by removing the commitment to withhold unsafe models if another laboratory releases a comparable model first. The competitive structure of the AI industry means that restraint by one actor does not prevent the underlying capability from eventually proliferating. Voluntary governance frameworks work best when they generate shared norms across an industry. When the industry is structured around intense competition among a small number of organisations, voluntary restraint by a single actor does not resolve the broader question of access.
Analysts note that what Mythos does today in a restricted environment, publicly available models are likely to replicate within one to two model generations. The next phase of the EU AI Act takes effect in August 2026, introducing automated audit trails, cybersecurity requirements for AI systems classified as high risk, incident reporting obligations, and penalties of up to 3% of global revenue. The EU framework represents a shift toward binding governance, but its scope relative to the pace and international distribution of frontier AI development remains to be demonstrated.
Conclusion
Anthropic acknowledges that capabilities like those demonstrated by Mythos will proliferate beyond actors committed to deploying them safely, with potential fallout for economies, public safety, and national security. The company’s response, taken in aggregate, reflects a serious attempt to manage that risk within the constraints of voluntary frameworks and private decision-making. The Responsible Scaling Policy, Project Glasswing, proactive government briefings, and the detailed system card are each substantive contributions. They are also all products of a single private entity’s judgement, operating without binding external accountability.
The Mythos case does not so much call for a different assessment of Anthropic’s conduct as it does a clear-eyed view of what voluntary governance can realistically sustain at the frontier of AI development. Governments on both sides of the Atlantic were briefed informally about a model whose capabilities are consequential for critical infrastructure and national security. No binding notification requirement existed. No independent technical authority had prior access. No international coordination mechanism was in place.
No single organisation can solve these challenges alone. Frontier AI developers, software companies, security researchers, open-source maintainers, and governments all have essential roles to play. The Mythos case has made that observation not merely a statement of aspiration but a policy problem that requires concrete institutional responses. Whether those responses will take shape before the next capability threshold is reached is the question now facing policymakers.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!
