On Anthropic's Mythos Preview and Project Glasswing

On Anthropic’s Mythos Preview and Project Glasswing

The cybersecurity industry is obsessing over Anthropic’s new model, Claude Mythos Preview, and its effects on cybersecurity. Anthropic said that it is not releasing it to the general public because of its cyberattack capabilities, and has launched Project Glasswing to run the model against a whole slew of public domain and proprietary software, with the aim of finding and patching all the vulnerabilities before hackers get their hands on the model and exploit them.

There’s a lot here, and I hope to write something more considered in the coming week, but I want to make some quick observations.

One: This is very much a PR play by Anthropic—and it worked. Lots of reporters are breathlessly repeating Anthropic’s talking points, without engaging with them critically. OpenAI, presumably pissed that Anthropic’s new model has gotten so much positive press and wanting to grab some of the spotlight for itself, announced its model is just as scary, and won’t be released to the general public, either.

Two: These models do demonstrate an increased sophistication in their cyberattack capabilities. They write effective exploits—taking the vulnerabilities they find and operationalizing them—without human involvement. They can find more complex vulnerabilities: chaining together several memory corruption bugs, for example. And they can do more with one-shot prompting, without requiring orchestration and agent configuration infrastructure.

Three: Anthropic might have a good PR team, but the problem isn’t with Mythos Preview. The security company Aisle was able to replicate the vulnerabilities that Anthropic found, using older, cheaper, public models. But there is a difference between finding a vulnerability and turning it into an attack. This points to a current advantage to the defender. Finding for the purposes of fixing is easier for an AI than finding plus exploiting. This advantage is likely to shrink, as ever more powerful models become available to the general public.

Four: Everyone who is panicking about the ramifications of this is correct about the problem, even if we can’t predict the exact timeline. Maybe the sea change just happened, with the new models from Anthropic and OpenAI. Maybe it happened six months ago. Maybe it’ll happen in six months. It will happen—I have no doubt about it—and sooner than we are ready for. We can’t predict how much more these models will improve in general, but software seems to be a specialized language that is optimal for AIs.

A couple of weeks ago, I wrote about security in what I called “the age of instant software,” where AIs are superhumanly good at finding, exploiting, and patching vulnerabilities. I stand by everything I wrote there. The urgency is now greater than ever.

I was also part of a large team that wrote a “what to do now” report. The guidance is largely correct: We need to prepare for a world where zero-day exploits are dime-a-dozen, and lots of attackers suddenly have offensive capabilities that far outstrip their skills.

Tags: AI, cyberattack, cybersecurity, exploits, vulnerabilities

Posted on April 13, 2026 at 12:52 PM • 9 Comments

Comments

Medo • April 13, 2026 6:48 PM

My understanding of the Aisle result is that they could get small models to find some of the same vulnerabilities if they were told exactly where to look. When these smaller models are pointed at code without vulnerabilities (like a patched version of one of the vulns found by Mythos), they often hallucinate a vulnerability that is not present.

So it seems like it might be a lot more effort to actually use these smaller models to find vulnerabilities in practice, because if you point them successively at small pieces of code where they would manage to find a true vulnerability, I’d predict that signal would get drowned in false positives.

_ • April 14, 2026 1:35 AM

The plan is to allow AI to “create” a conflict so terrible where the only solution is to follow the “master” to implant Neuralink (or like) devices in people to “advance” beyond AI.

It is coming.

ATN • April 14, 2026 4:19 AM

PR play by Anthropic is very good indeed, calling the banks like in https://www.retailbankerinternational.com/news/us-uk-banks-anthropic-ai/ (lots of google hits), also reported on TV.
Without Anthropic protection, people may create money from scratch without central banks noticing.
I wonder if Anthropic also tested some banking apps…

Rontea • April 14, 2026 2:53 PM

Project Glasswing is an important step, but it’s ultimately a reactive approach—racing to patch holes before attackers adapt. The larger lesson is that our current security paradigm is ill-equipped for models that can automate vulnerability discovery at scale. As AI systems grow more capable, we need to move toward systemic resilience rather than hoping to stay one patch ahead of the adversary.

lurker • April 14, 2026 4:41 PM

@Rontea, ALL

Are the industry stalwarts subscribed to Project Glasswing aware that they have a common cause here?

How many of them (if any) will continue after this project on the exiting codebase, and then employ Mythos Preview or its successors to purge their code before it gets to market?

Weather • April 14, 2026 5:35 PM

@medo
Excaly they say its so good, but hay didn’t a script for Ida come out 15 years to do the same thing.

ResearcherZero • April 14, 2026 11:17 PM

A large portion of networked communications relies on legacy firmware that is decades old. It is difficult to patch this stuff because it contains closed-source memory unsafe code.

In mobile modems, DNS can parse unsafe code allowing it to be loaded into memory. The firmware contains legacy closed-source C/C++ for fast operation, which is memory unsafe.

In the pre-authentication phase, Radio Resource Control (RRC) configuration messages remain in plaintext and lack integrity protection. Baseband has Direct Memory Access (DMA) in many models of cellular phone and it is always on.

Where vulnerabilities exist, code can run to compromise the device or downgrade 5G to older protocols. Meaning a well resourced attacker can target the phone remotely, spy on the device owner, or send forged messages.

‘https://projectzero.google/2023/03/multiple-internet-to-baseband-remote-rce.html

Vulnerabilities will continue to be discovered in the 5G baseband of cellular modems.
https://cellularsecurity.org/ransacked

To help mitigate threats, Google introduced a Rust-based low-level DNS library.
https://security.googleblog.com/2026/04/bringing-rust-to-pixel-baseband.html

ResearcherZero • April 14, 2026 11:23 PM

It would be handy to have an AI system that can patch bugs for humans.

There are lots of examples of old, legacy code in systems which is enabled and used.
Enterprise and other systems still may use Telnet. Many distributions support it.

https://labs.watchtowr.com/a-32-year-old-bug-walks-into-a-telnet-server-gnu-inetutils-telnetd-cve-2026-32746/

Ralph • April 15, 2026 11:26 AM

So are the AI companies going to engage in “responsible disclosure” – i.e., where their model finds a vulnerability, ensure that the authors of the vulnerable code see that a few months before anyone else.

I mean the big IT corporates have been badgering security researchers about this for over a decade, so shurely they won’t change their tune now?

Delaying the release of the model implements that initially, but they need to ensure “responsible disclosure” on an ongoing basis, right?

Schneier on Security

On Anthropic’s Mythos Preview and Project Glasswing

Comments

Leave a comment Cancel reply