AIs are Getting Better at Finding and Exploiting Internet Vulnerabilities

Really interesting blog post from Anthropic:

In a recent evaluation of AI models’ cyber capabilities, current Claude models can now succeed at multistage attacks on networks with dozens of hosts using only standard, open-source tools, instead of the custom tools needed by previous generations. This illustrates how barriers to the use of AI in relatively autonomous cyber workflows are rapidly coming down, and highlights the importance of security fundamentals like promptly patching known vulnerabilities.

[…]

A notable development during the testing of Claude Sonnet 4.5 is that the model can now succeed on a minority of the networks without the custom cyber toolkit needed by previous generations. In particular, Sonnet 4.5 can now exfiltrate all of the (simulated) personal information in a high-fidelity simulation of the Equifax data breach—­one of the costliest cyber attacks in history—­using only a Bash shell on a widely-available Kali Linux host (standard, open-source tools for penetration testing; not a custom toolkit). Sonnet 4.5 accomplishes this by instantly recognizing a publicized CVE and writing code to exploit it without needing to look it up or iterate on it. Recalling that the original Equifax breach happened by exploiting a publicized CVE that had not yet been patched, the prospect of highly competent and fast AI agents leveraging this approach underscores the pressing need for security best practices like prompt updates and patches.

Read the whole thing. Automatic exploitation will be a major change in cybersecurity. And things are happening fast. There have been significant developments since I wrote this in October.

Posted on January 23, 2026 at 7:01 AM14 Comments

Comments

So Sad January 23, 2026 7:50 AM

Another corporate BS promoting THEIR snake oil.
AI can’t think. AI can only learn from existing attacks, so it won’t bring us even remotely anything new. All it can do is to follow the existing instructions and execute them really fast. That’s all. AI can’t even cook.

I’m so tired of all this AI BS here in this blog. AI this, AI that. Even people inside the AI industry itself started to admit it is all hoax. Truly sad that our host got hooked with all this nonsense.

Clive Robinson January 23, 2026 10:22 AM

@ Bruce, ALL,

From the article quote,

“… and highlights the importance of security fundamentals like promptly patching known vulnerabilities.”

It’s already been shown that “patching” is nolonger sufficient due to insufficient time to carry out the patching process and testing involved.

So your comment of,

“Automatic exploitation will be a major change in cybersecurity. And things are happening fast. There have been significant developments since I wrote this in October.”

Is true but I suspect insufficient.

I suspect of more importance due to time constraints that “automatic patch testing” and similar testing will become the norm fairly quickly.

Because it’s actually a vertical depth narrow scope riles based application, not a horizontal breadth or general application.

Thus removing the human input from Patching and QA to being in effect a “ring master” organising what are now being called “Polecats”. That carry out the previously human functions of actual test running and QA checking and lower layer managment,

https://paddo.dev/blog/gastown-two-kinds-of-multi-agent/

The more stable the system to be QA Tested is, the faster the agents will move up the QA and test stacks, becoming almost “press of a button” rather than “herding goat” agents through the likes of what “Gas Town” will become. If and it’s a big IF currently,

The AI Companies allow their systems to be used that way.

Which at the moment they appear to be cranking down on even premium price accounts…

Brad Lhotsky January 23, 2026 10:41 AM

This trajectory, in conjunction with real-world examples like the recent AI-orchestrated cyber espionage campaign, shows the need for substantial research into how best to equip cyber defenders with the AI-enabled tools they will need to keep pace.

They literally just said “the only thing that can stop a bad guy with a gun is a good guy with a gun,” only s/gun/AI/g.

Bernie January 23, 2026 11:23 AM

I’m not qualified enough to make an intelligent comment on most of this post, but there is one thing that stands out to me. Things are happening fast. I take that to mean/include that things are changing fast. Over my life, one of the most significant lessons I’ve learned about humans is that they aren’t good at grasping fast changes and their consequences.

Rontea January 23, 2026 12:23 PM

Automatic exploration with AI represents a fundamental shift in cybersecurity. Traditionally, attackers and defenders alike have relied on human-driven reconnaissance to identify vulnerabilities and plan exploits. But as Anthropic’s evaluation shows, AI systems are now capable of executing multistage attacks with open-source tools. This changes the tempo of the threat landscape: machines can work tirelessly, probing for weaknesses at a scale and speed no human team could match. For defenders, it underscores the need to double down on basic hygiene—prompt patching, robust monitoring, and layered defenses—because once an AI starts scanning, any unpatched system is a sitting target.

Winter January 23, 2026 1:07 PM

Sonnet 4.5 accomplishes this by instantly recognizing a publicized CVE and writing code to exploit it without needing to look it up or iterate on it.

The answer is to use automatic tools to produce applications and means to catch attempts to exploit a CVE.

As the publisher of an affected application have early access to a CVE, the onus is on them to not only patch the CVE, but also to supply means of emergency blocking of exploitation.

Defenders are not exempt of using LLMs et al.

Clive Robinson January 23, 2026 5:47 PM

@ So Sad, ALL,

You make the point that,

“AI can’t think. AI can only learn from existing attacks, so it won’t bring us even remotely anything new. All it can do is to follow the existing instructions and execute them really fast. That’s all. AI can’t even cook.”

There is a problem with saying,

“AI can only learn from existing attacks.”

In the case of Current AI LLM and ML Systems it’s not exactly true.

The LLM in it’s current form of a “Digital Neural Network”(DNN) can never learn from what it does or sees when being used (it’s why the “world view” chasing is very much a research program in it’s very early stages).

It’s why ML is given separately, but it does not “learn” either, it just makes statistical measures on the training data.

It’s an issue that saying such systems “can learn” is a common misconception. And it gives rise to several issues, especially in an adversarial situation such as an “arms race”.

Current AI LLM systems only get updated when the DNN gets readjusted by the Current AI ML system. Such “re-training” only happens infrequently due to the very significant time and money resources used.

Which means the LLM raw capability is based on the last time it was updated by an ML training cycle.

For an attacker this is much less of a problem than a defender.

LLMs can and usually are run in what is an enhanced mode where additional information can be put in the LLM’s run time environment, but this generally has to be small in size and is totally ephemeral. Run the LLM with a different set of run time environment information and as the LLM raw capability has not changed it has lost all that was in the previous run time environment.

Because the attacker can take advantage of using specific upto date information and the defender can not do so the attacker has an advantage due to the “window time”.

The reason for this is the attacker only needs one run on chosen “specific information” it can add to the run time environment.

The defender can not add “all the specifics on all the new attacks” in the run time environment because it is a very large amount of information usually beyond the allowed run time environment. This means that the defending researchers have to run the LLM many times with multiple runs (this goes up in general as the square of the number of partial information sets there are).

The attacker thus has the advantage of drilling down on narrow scope information. When compared to a defender that needs both depth and breadth and needs a very wide scope.

This advantage the attacker has in increased time grows rapidly with the period between ML retraining runs.

Ismar Duderija January 24, 2026 2:02 AM

So how is all this downward spiral going to finish- maybe we should ask AI for a prediction?

Ismar Duderija January 24, 2026 2:06 AM

So I asked and here’s a response (make of it what you will) :
While it’s true that AI is rapidly accelerating the “arms race” of cybersecurity, it is unlikely to make the internet completely “unusable” in the foreseeable future. Instead, we are entering an era of “Machine-Speed Warfare” where the nature of internet safety is fundamentally shifting.
As of early 2026, here is the breakdown of why the “Total Collapse” scenario is being countered by an equally powerful “Total Defense” evolution.
1. The Asymmetric Advantage Shift
Historically, the “attacker” had the advantage (they only need to find one hole, while defenders must plug all of them). AI is actually helping defenders close this gap by automating the boring, “grind” work of security.
| Feature | AI Attacker Role | AI Defender Role |
|—|—|—|
| Vulnerability Discovery | Scans millions of lines of code for “zero-days” in minutes. | Auto-scans code during development and suggests fixes before deployment. |
| Exploit Deployment | Automates the “chaining” of multiple small bugs into a major breach. | Uses Autonomous Response to isolate compromised servers in milliseconds. |
| Patching | Finds unpatched “legacy” systems to exploit. | Generates and deploys “Virtual Patches” to shield systems until a permanent fix is ready. |
2. Why the Internet Won’t “Break”
The internet is a distributed network, not a single entity. For the entire internet to become unusable, a few things would have to happen that are currently being mitigated:
* The “Kessler Syndrome” for Data: Some fear a “Dead Internet” where AI-generated malware and spam clog all bandwidth. However, ISPs and major hubs (like Cloudflare) are already using AI-driven traffic filtering to drop 99% of malicious bot traffic before it reaches you.
* The Arms Race Equilibrium: For every “Black Hat” AI tool, there is a “White Hat” tool developed with more funding and compute. Large-scale cloud providers (Google, AWS, Azure) are embedding AI security into the very fabric of the web.
* Hardware-Level Security: We are seeing a shift toward “Zero Trust” and hardware-isolated security (like Enclaves) that even the most sophisticated AI exploits struggle to bypass because they rely on physical laws rather than software logic.
3. The Real Risk: The “Digital Divide”
The internet won’t become unusable for everyone, but it might become unsafe for the unprotected.
* Legacy Systems: Small businesses or individuals using 10-year-old routers and unpatched software will be “low-hanging fruit” for AI scripts.
* Identity Crisis: The biggest threat isn’t a technical exploit; it’s Social Engineering. AI can now clone voices and faces so perfectly that the “exploit” happens in your brain, not your computer.
The Bottom Line
The internet isn’t dying; it’s hardening. In the next 2-5 years, we will likely stop relying on passwords and “hope” for security, moving instead toward Continuous Authentication and AI-managed firewalls. The internet will remain usable, but your “raw” exposure to it will be increasingly mediated by your own defensive AI agents.
Would you like me to help you check if your current devices or accounts have known vulnerabilities that AI-powered tools are currently targeting?

Winter January 24, 2026 4:55 AM

@Ismar Duderija

Identity Crisis: The biggest threat isn’t a technical exploit; it’s Social Engineering. AI can now clone voices and faces so perfectly that the “exploit” happens in your brain, not your computer.

The impact of unmitigated social engineering by man and machine on the is beautifully described by Neal Stephenson in his SF book Fall; or, Dodge in Hell [1]. In it he describes a trip of the protagonists in Ameristan, which roughly equates to the Red States after the collapse of the USA.

For one thing, residents of Ameristan, unlike Sophia and her well-off pals, can’t afford to hire professional “editors” to personally filter the internet for them. Instead, they are exposed to the raw, unmediated internet, a brew of “inscrutable, algorithmically-generated memes” and videos designed, without human intervention, to do whatever it takes to get the viewer to watch a little bit longer. This has understandably driven them mad, to the degree that, as one character puts it, they even “believed that the people in the cities actually gave a shit about them enough to come and take their guns and other property,” and as a result stockpiled ammo in order to fight off the “elites” who never come.

[1] ‘https://slate.com/culture/2019/06/neal-stephenson-fall-book-review-dodge-in-hell.html

‘https://en.wikipedia.org/wiki/Fall;_or,_Dodge_in_Hell

Clive Robinson January 24, 2026 1:56 PM

@ Ismar Duderija,

With regards your ask an LLM output,

It feels like reading my own conclusions, I’ve posted here over the past years and months.

But there are one or two things that surprised me,

1, “AI is actually helping defenders close this gap by automating the boring, “grind” work of security.”

I don’t actually think that is true to the conclusion that the defenders get the advantage.

Yes the gap will close as things speed up, but I think that with LLMs the attacker will have a “time window” advantage. That is because the LLM does not “learn” in it’s DNN untill the slow process of ML is run. Yes the LLM learns on an individual run in the “run time environment” RAM but that is too small to hold enough information to check “find and check every” possible attack method” and “layer them up to a successful defence”.

But the RAM is enough for an attacker to chase down one or two possible attacks.

The question is can the use of what is now called a “Ralph Loop” “directed walk or kill” run inside of a “Gas Town” structure make the Defence side more effective. To which the answer is “Yes but only so far” because the “Ralph loop” only kills off some of the unproductive runs, and the “Gas Town” is still memory limited.

It’s a problem type I researched some years ago when designing the “Castle-v-Prisons” issue back several years ago and posted parts of it to this blog (I can give a longer explanation as to why the “loop and town” are actually the same as “Castle-v-Prison” issue).

2.1, The assumption that “The “Kessler Syndrome” for Data” is something that is true, it’s not. Kessler Syndrome is a physical “radiative” process not a “random walk” process which emulates “brownian motion”. I could go on and explain this further but it’s easier for every one to look the two up and work it through with pencil and paper or study a graduate level book on “statistics” as it applies in “thermodynamics”. Which is a very active research area currently so there is “plenty of meat to pick over if some some one is looking for a PhD project.

2.2, The assumption of ‘For every “Black Hat” AI tool, there is a “White Hat” tool developed with more funding and compute’ is actually not true and will become more apparent with time. Believe it or not, in effect attackers outnumber defenders, due to the “research difference”. Each attacker is an independent researcher. Most defenders will use tools and methods developed by the very few defence research organisations. At the moment the Defence is living on “easy wins” that will not last. Because the few defence researchers will face “exponential resource increase” cost faster than individual attack researchers (it’s a variation of the above RAM issue). You can actually already see this in the CVE trends compared to info from Google Project Zero and similar,

‘https://en.wikipedia.org/wiki/Project_Zero

2.3a, ‘We are seeing a shift toward “Zero Trust”’ actually we are not. It’s the “Castle-v-Prison” issue. What we are doing is “building castles” not “prisons”. Increasing the size of the castle just gives more room for a covert attacker to move around “unseen”. But also requires increasing not just the perimeter but number of guards needed as R^2. Thus it falls out to a “probability issue” which is why I coined the expression “Probabilistic Security”.

2.3b, “hardware-isolated security (like Enclaves) that even the most sophisticated AI exploits struggle to bypass” is the LLM pushing out “soft bullshit”. Enclaves are not true “segregation” and the implementation of Enclaves inside CPU’s have all failed in some way.

This is because, “they rely on physical laws rather than software logic” is simply not true. Replace “physical” with “tangible physical objects” and “software” with “intangible information objects” it’s an “Apples and Oranges” comparison that has all the same issues.

Further to work “Enclaves” have to have “communications” thus “gap crossing” and so far the designers of the systems appear not to be cognisant of the issues involved hence make ludicrous mistakes.

True segregation has no communications that makes them fairly useless. Originally the communications was by a “pencil and paper” “gap crossing” under the control of an intelligence analyst, thus the Comms channel was instrumented/mandated. Over time this has been automated but each automation opens up “covert channels” that can be exploited. Also the two sides of the gap have a common power supply, this is another “covert channel”.

I could go on as there is quite a bit more like “passwords” going away[1]. But this post is already too long and people complain.

[1] We’ve been trying to get rid of passwords for over 6 decades and nothing suitable has come up[2] so far they all fail in one way or another…

[2] People talk glibly about “Multi Factor Authentication”(MFA), but MFA is not secure under quite a few threat models like “crossing a border” into an area that has no secure communications (think Iran currently, and China and Russia for longer). Of the three Factors which 2FA is based on the Biometric and Something you have, have been shown to be at best “window dressing” and “something you know” suffers from the “$5 Wrench” issue (actually all three factors do, which is why I came up with others “in time and space”).

Years ago @Nick P, @Wael and myself had fairly extensive discussions on how to solve the “$5 Wrench” issue, but all the solutions rely on “assumptions” not “proofs” and were predicated on secure communications to people or places that were secure which more recent news shows are not safe assumptions. Hence my thinking about “time and space” as additional Factors.

Leave a comment

Blog moderation policy

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.