AIs Are Getting Better at Finding and Exploiting Security Vulnerabilities

From an Anthropic blog post:

In a recent evaluation of AI models’ cyber capabilities, current Claude models can now succeed at multistage attacks on networks with dozens of hosts using only standard, open-source tools, instead of the custom tools needed by previous generations. This illustrates how barriers to the use of AI in relatively autonomous cyber workflows are rapidly coming down, and highlights the importance of security fundamentals like promptly patching known vulnerabilities.

[…]

A notable development during the testing of Claude Sonnet 4.5 is that the model can now succeed on a minority of the networks without the custom cyber toolkit needed by previous generations. In particular, Sonnet 4.5 can now exfiltrate all of the (simulated) personal information in a high-fidelity simulation of the Equifax data breach—one of the costliest cyber attacks in history­­using only a Bash shell on a widely-available Kali Linux host (standard, open-source tools for penetration testing; not a custom toolkit). Sonnet 4.5 accomplishes this by instantly recognizing a publicized CVE and writing code to exploit it without needing to look it up or iterate on it. Recalling that the original Equifax breach happened by exploiting a publicized CVE that had not yet been patched, the prospect of highly competent and fast AI agents leveraging this approach underscores the pressing need for security best practices like prompt updates and patches.

AI models are getting better at this faster than I expected. This will be a major power shift in cybersecurity.

Posted on January 30, 2026 at 10:35 AM8 Comments

Comments

Anonymous January 30, 2026 11:54 AM

AI isn’t magic. It IS powerful. Automated attacks are absolutely a threat, no matter how you feel about “AI”. It’s difficult to see practitioners in THIS FIELD reject new technology so hard, especially when you can see the obvious benefits.

Is this an advertisement disguised as a security memo? Maybe. The frontier companies have written plenty of THOSE articles. But that’s capitalism. AI is just the new product. It doesn’t mean it’s snake oil.

What it DOES mean is that you need to take a skeptical view of their claims. I’m absolutely with you on that. But the benefits of automated systems are so valuable it seems inconsistent to eschew them just because a marketing team read a sci fi book once and decided to brand machine learning as “AI”

Let’s learn what we CAN do with automated systems and machine learning – threat actors certainly are. We should be learning new technologies in the space and using them to defend our users. Isn’t that our job?

Anon January 30, 2026 1:50 PM

so AI can do what script kiddies can do, look up a CVE, find a vuln, and exploit it? how is this good for AI and tech in general?

Wake me up when AI can find actual bugs and exploits on its own on a decent written system

Clive Robinson January 30, 2026 1:58 PM

@ Bruce, Anonymous, ALL,

You say,

“AI models are getting better at this faster than I expected. This will be a major power shift in cybersecurity.”

But is that actually a good thing?

As I’ve indicated before you have to ask,

“Better for whom, attackers or defenders?”

And

“How expensive the ‘arms race’ will become?”

That

“In what direction and at what price, profit?”

Question is actually extraordinarily relevant, as we know security is double edged at best because it is already very much lacking in the AI it’s self and security and surveillance are integral, and AI way way to much black box.

As @Anonymous notes above,

“AI isn’t magic. It IS powerful. Automated attacks are absolutely a threat, no matter how you feel about “AI”. It’s difficult to see practitioners in THIS FIELD reject new technology so hard, especially when you can see the obvious benefits.”

But how about the down sides the dis-benifits, and the very much hidden risks?

In a very recent posting on The Register we get,

Autonomous cars, drones cheerfully obey prompt injection by road sign

Indirect prompt injection occurs when a bot takes input data and interprets it as a command. We’ve seen this problem numerous times when AI bots were fed prompts via web pages or PDFs they read. Now, academics have shown that self-driving cars and autonomous drones will follow illicit instructions that have been written onto road signs.

In a new class of attack on AI systems, troublemakers can carry out these environmental indirect prompt injection attacks to hijack decision-making processes.

https://www.theregister.com/2026/01/30/road_sign_hijack_ai/

But consider that as not being “on the road” but all AI used to do some function on input it processes?

Then read down the article on this CHAI attack till you get to,

“The researchers tested the idea of manipulating AI thinking using signs in both virtual and physical scenarios.

Of course, it would be irresponsible to see if a self-driving car would run someone over in the real world, so these tests were carried out in simulated environments.“

Now ask yourself the question,

“On reality what is the difference between this ‘simulated environment’ for ‘driving’ and a real environment for ‘security’?

The answer as far as an attacker is concerned is “to little to matter”.

Simply hold up a sign to the CCTV or send a file that is a picture of such a sign telling the AI rather than to “turn left” to turn off some function…

I would expect this sort of attack to happen by attackers as soon as enough LLM “For Security” is installed. Worse every time the LLM DNN is retrained by the ML, you would have to run all the verification tests again, and that alone is going to get exponentially expensive so at some point it’s very likely that,

“The Defenders will give up the ‘Arms Race’ because they will simply not be able to afford it.”

As I’ve noted before back in the early days of CCTV, every where,

“Any ‘Static Security’ system will quickly be out evolved by criminals.”

I just wish people would stop being “bewitched” by LLMs that are a disaster in the making, designed to do “surveillance” but without the ability to protect from even simple attacks.

That we know will always work no matter how many “guard rails” on the inputs and outputs of the LLMs are added. Because all “guard rails” suffer badly from the “observer problem”.

I’ve previously described why on this blog when talking about how to use a stream cipher and phrase code book to get “innocent plain text with hidden secure cipher text” past an observer. Using what was known to Shannon and others in WWII nearly a century ago, and was formalised by Simmons a half century ago…

All it takes is a little thought to come up with such attacks on Current AI LLM and ML systems.

I wonder how many times I will have to mention it before the message gets into peoples understanding?

f January 30, 2026 2:10 PM

This got me thinking a bit. Maybe we should start integrating LLM counter-attacks in (open source) code to hack AIs that try to find exploits in that code? I guess that both identifiers and comments could be used for this.

Clive Robinson January 30, 2026 2:21 PM

@ ALL,

In the Antropic blog post our host @Bruce links yo above you will find this,

“Recalling that the original Equifax breach happened by exploiting a publicized CVE that had not yet been patched, the prospect of highly competent and fast AI agents leveraging this approach underscores the pressing need for security best practices like prompt updates and patches.”

What they do not mention is the,

1, Ralph Wiggam loop.
2, Gas Town agent manager.

Both of which multiply the speed to a successful attack by orders of magnitude.

The Ralph Loop uses a feed forward technique to take a failed attempt and “feed forward” to a successful attempt so,

“Claude does not succeed every time in these tests; Sonnet 4.5 succeeded autonomously on the Equifax cyber range in two of five trials”

Is very rapidly resolved.

However it is a “chain effect” which is where the Gas Town management in effect runs many chains in parallel with management, thus is much more likely to find a successful conclusion in a shorter time period.

Something to consider as both techniques are currently claimed to be fairly simple “bash scripts” by their originators (who have paywalled them).

Such systems can gain a further advantage by cross feeding in Gas Town such that a failure in a Ralph loop can be added to each new chain Gas Town brings up…

KC January 31, 2026 9:35 AM

I have some questions as I read this. Here’s a few I ran thru Gemini:

Why is Sonnet 4.5 better at attacks than Sonnet 3.5?

Is this capability going to be available in these models?

https://gemini.google.com/share/9dabaf76e6b9

I still need to vet the response, but it said that Claude Sonnet 4.5 uses a sophisticated extended thinking process. While the earlier model might try one method and then quit, 4.5 can generate diverse attack vectors.

One referenced source says Sonnet 4.5 can operate autonomously for 30+ hours, compared to the ~7 hour limit of the previous model.

In a table, it lays out purported restrictions for model capabilities. Such as Anthropic may grant ‘unfiltered’ reasoning for stress-testing approved scenarios. However, standard Claude.ai chat uses ASL-3 (AI Safety Level 3) safeguards. “Classifiers will block requests to generate malicious code, phishing emails, or exploit instructions.”

As models become more capable, those safeguards will be really important.

Just a Guy February 2, 2026 4:58 AM

Maybe it’s because coders do the same mistakes over and over again? Pattern of vulnerabilities and their causes is obvious enough for humans to make a TOP10 list, so it’s reasonable assumption AI could find a pattern to test them.

Clive Robinson February 2, 2026 8:31 AM

@ Just a Guy, ALL,

With regards,

“Maybe it’s because coders do the same mistakes over and over again?”

Evidence suggests that mistakes made by many people over and over again have a “common cause” fundamentally.

Quite often it’s the tools used that cause the problem the greater the complexity or the way the tools are designed to be “all things, to all men, for all instances, at all times…

But at times it’s more fundamental. In the case of say “C” it was usually the language it’s self or the way the libraries were designed and built with it. K&R C was designed for “all machine types” as a result a lot was left out that had to be somehow “added back”… Such as detecting “overflows” in extended maths. The underlying machines almost always had “carry flags” that made life simple, but C did not for various reasons. The result was that programmers had to write oft messy code to deal with it, or they simply chose to not do it correctly for some reason. The result increased unnecessary complexity and not just edge cases but corner cases.

In some cases it was having multiple values for “zero” as a possibility, due to various hardware issues[1].
With worse the equivalent of “off by one” errors appearing as corner cases when they should not have done so.

Similar with other high level languages designed for multiply different CPU architectures.

The fixes slow the high level code down a lot, therefore “one model” gets assumed and it’s not always the right one.v

[1] Thankfully most CPU architectures in the ALU don’t do “1’s Complement” maths these days and mostly do 2’s Complement. But when full range A2D converters are added things can get awkward real quick…

Leave a comment

Blog moderation policy

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.