LLMs are Getting a Lot Better and Faster at Finding and Exploiting Zero-Days

Opus 4.6 is notably better at finding high-severity vulnerabilities than previous models and a sign of how quickly things are moving. Security teams have been automating vulnerability discovery for years, investing heavily in fuzzing infrastructure and custom harnesses to find bugs at scale. But what stood out in early testing is how quickly Opus 4.6 found vulnerabilities out of the box without task-specific tooling, custom scaffolding, or specialized prompting. Even more interesting is how it found them. Fuzzers work by throwing massive amounts of random inputs at code to see what breaks. Opus 4.6 reads and reasons about code the way a human researcher would—looking at past fixes to find similar bugs that weren’t addressed, spotting patterns that tend to cause problems, or understanding a piece of logic well enough to know exactly what input would break it. When we pointed Opus 4.6 at some of the most well-tested codebases (projects that have had fuzzers running against them for years, accumulating millions of hours of CPU time), Opus 4.6 found high-severity vulnerabilities, some that had gone undetected for decades.

The details of how Claude Opus 4.6 found these zero-days is the interesting part—read the whole blog post.

News article.

Tags: AI, LLM, zero-day

Posted on February 9, 2026 at 7:04 AM • 15 Comments

Comments

Javier Kohen • February 9, 2026 7:34 AM

When I first used AI for a code refactoring about a year ago it surprised me by rewriting seemingly unrelated portions of my handwritten code. I discovered later that it was fixing bugs in the business logic and error handling.

One traditional problem with AI has been that it is hard to explain its conclusions. As long as there has to be a human in the loop, and there certainly needs to be one on most cases still, we need to be partners with the AI. I’ve noticed the industry has been paying attention by making the models explain their thinking, which apparently both improves the outcome and our ability to interact with the AI.

Now that reminds me of a finding, I think by anthropic, where a model explained that it performed a basic math calculation using the elementary school method, but the researchers were able to look under the hood and confirm it was actually using a combination methods, including heuristics. Not unlike how the actual human brain would do small number math in the real world. So AI could easily become that lazy co-worker who never wants to explain their real motivations to rewrite your code.

K.S • February 9, 2026 7:47 AM

In the optimistic case for LLM assistants in coding, it will make all but senior coder positions obsolete by drastically increasing productivity. As the result, we will get better code faster and it will be reviewed/signed off on by competent people.

In the pessimistic case, it LLMs will churn code much faster and cheaper than any coder, so a lot of people in software development and QA will lose jobs. The resulting code will be different – LLMs will make new categories of security and optimization mistakes that after a short transition honeymoon period will result in exponential explosion of vulnerable systems.

Max • February 9, 2026 9:02 AM

C’mon man. That article is just breathless speculation on top of another article that is breathless speculation on top of the original blog post by Anthropic (makers of Opus) themselves: https://red.anthropic.com/2026/zero-days/

Notably, out of the supposed hundreds of zero days found, they only include info for three of them, and the OpenSC analysis in particular seems invalid, or at least it’s entirely unclear from the snippet posted if or how the vulnerability could even be exploited.

tfb • February 9, 2026 9:35 AM

Someone wrote

In the optimistic case for LLM assistants in coding, it will make all but senior coder positions obsolete by drastically increasing productivity.

So, let’s see. How do you get to be good enough at programming to be a ‘senior coder’? Well you start off by being bad at programming and you practice, a lot. How do you stay good at programming? You practice, a lot.

So in the ‘optimistic case’ all the non ‘senior coder’ jobs get eaten and nobody new gets to practice enough to become really good. The people who are lucky enough to already be good enough not to get laid off also don’t practice any more and slowly lose their skills.

This is the ‘optimistic case’ in the same sense that there are optimistic cases for thermonuclear war.

Winter • February 9, 2026 11:23 AM

@tfb

> In the optimistic case for LLM assistants in coding, it will make all but senior coder positions obsolete by drastically increasing productivity.

So in the ‘optimistic case’ all the non ‘senior coder’ jobs get eaten and nobody new gets to practice enough to become really good.

Reports about the death of coding are greatly exaggerated, in my view.

We have seen this happening time and again, starting from binary switches, to assembler, to FORTRAN & C, to 3rd and 4th generation languages. At every step programming got “easier”, that is, more abstract. Programmers could insert whole data structure and connected algorithms with a single phrase. A complete running HTTP server is a single line of Python.

The result was that we wrote larger programs with much more bells and whistles.

What remains in all these layers of abstractions is that someone has to specify what the application is supposed to do, and select how it is supposed to do it.

LLMs will not change that. If what you want is a boiler plate problem, LLMs can give you boilerplate solutions. If you want a standard web site, an LLM will give you the little Python code you need for that. If you want it to have a standard WordPress backend, nothing could be more easy than derive it from the millions of WordPress instances that litter the interwebs.

But the user still has to specify what they actually want, and how they want it to behave. It would indeed be a boon when we can do this in an understandable text format. But that text format still has to be specific for the aims of the application.

I think of LLM programming as just another higher layer of compiler working on an even more abstract level that we are used to.

Think Photography. It used to be that only a true artist could make a likeness image of a person or landscape. Photography made it possible for everyone to create likenesses of everyone or everything.

Now, we employ more photographers for making images than we ever employed artists, because, creating a good image still requires the eye of an artist.

Clive Robinson • February 9, 2026 12:02 PM

@ Bruce, ALL,

It’s a puff / hype piece.

The clue is in the section you quote above,

“looking at past fixes to find similar bugs that weren’t addressed, spotting patterns that tend to cause problems, or understanding a piece of logic well enough to know exactly what input would break it.“

It’s not realy “finding anything new”.

It’s just instances in existing classes by a random process.

I described on this blog quite some time ago how to do this and it does not need Current AI LLM and ML Systems to do it.

But further it’s very much a “Database” and “rules” system no “Intelligence or reasoning” required.

As I’ve explained before you have “instances in classes of vulnerability”

When a new instance is found it either falls into an existing class, or creates a new class so you end up with the vulnarabilty having being either,

1, An unknown instance in a known class.
2, An unknown instance in an unknown class.

As you can not have “a known instance without a Class for it to be in that just leaves, by far the largest number of vulnerabilits of,

3, An unknown instance in an unknown class”

That are yet to be discovered.

There are two basic ways to discover such unknown vulnerabilities

1, The luck of fuzzing.
2, The considered reasoning of some one who has an appropriate level of knowledge in the right area.

Now consider this, all instances of vulnerability are points on a line, plane or space depending on how you want to define things.

Around one or more points is an area of space that forms a class.

Thus you can see bubbles in the space inside of which is knowledge of what makes up the class and outside of which is a lot of space for either existing classes to expand into or new classes to be established.

Now if you have sufficient knowledge about an existing class you can expand it by selective fuzzing or what some used to call stressing.

The further the way you are from any class the more reasoning you have to do, but there are limits on this.

But whilst totally random fuzzing might find you a new vulnerability it takes considerable knowledge and reasoning to turn it into an exploitable vulnerability.

Which leaves the issue of “reasoning” current AI does the “Database and rules” aspect and random perturbation comes as part of their functioning. So by chance they can find a vulnerability.

But they lack the method to do anything that is not already known.

Thus there will be a very large part of the problem space they will not be able to be effective in finding actual new instances of attack, rather than variations on known attacks.

Consider why Alpha Fold and Similar systems function they are in effect a database and rules thus they went very rapidly through combinations. Not to dissimilar to “playing a game” like Chess or Go. But in essence “running on tracks” not “roaming with agency”.

This “automating vulnerability discovery” is just a small part of the process of actually coming up with “known instances” and “known classes” and does not require “intelligence” artificial or otherwise.

As for expansion of known classes it can use existing knowledge, reasoning it out is not necessary or required.

But the thing to note, is that an LLM will not do “new reasoning” unlike humans, so it can not move into the problem space untill an experienced human who can reason it out opens up the “existing knowledge” and it gets put via ML into the LLM.

If anyone want’s to write this up as a formal proof and publish it “be my guest”.

Rontea • February 9, 2026 12:46 PM

It’s important to remember that intelligence—whether human or artificial—is ultimately a tool. Systems like Claude Opus 4.6 show that machine intelligence can surface vulnerabilities at unprecedented speed and scale. But deploying AI in cybersecurity doesn’t eliminate the need for human intelligence; it changes the context in which it operates. Attackers will adapt to AI defenses, and defenders will adapt to AI-augmented attacks. Intelligence, in all its forms, remains a continuous requirement in the security ecosystem. The challenge isn’t just building smarter systems, but ensuring that we use intelligence—human and artificial—responsibly.

Dave • February 9, 2026 9:54 PM

This is amazing:

“This” is written by someone peddling AI slop. Ghostscript is an ancient, enormous, crufty mass of hacks and kludges that’s practically guaranteed to have 0days in it, so you’re absolutely guaranteed to get at least some results in it no matter what tools you use, including “grep strcpy *.c”.

I’ll be amazed when an AI tool outperforms standard SAST tools in curated benchmark tests, not in an AI tool vendors press release.

Clive Robinson • February 11, 2026 2:13 AM

@ Bruce, Winter, ALL,

In my above comment,

https://www.schneier.com/blog/archives/2026/02/llms-are-getting-a-lot-better-and-faster-at-finding-and-exploiting-zero-days.html/#comment-451995

I give an argument about why LLMs might be good in theory but actually not really any good in practice for finding new vulnerabilities.

I’m not the only one making this type of argument see “Cheap Design” from Mike Dotty,

https://dottedmag.net/blog/cheap-design/

Cheap design

“In the physical world, making things has two costs: design and fabrication. Design is drawing up the part. Fabrication is producing it. Both are expensive, so physical things are made from standard parts: bolts, beams, extrusions, brackets. You accept the overhead of fasteners and adapters because custom fabrication costs too much.

3D printing made fabrication cheap. Now a built-to-purpose bracket can be one piece instead of an assembly of catalog parts, two bolts, two nuts, two washers and a shim. But 3D printing has a quality penalty: layer lines, anisotropic strength, limited materials. So standard parts survive wherever strength and precision matter.

Software has only ever had one cost: design. Fabrication — compilation, copying — is cheap, and has been from the start. But design is labor, and labor is expensive, so software converged on the same pattern as physical manufacturing: build from standard parts. Libraries, frameworks, package managers. The node_modules folder is a junk drawer of standard components, except the drawer is the size of a room. go.mod is a curated components library with a card catalog — you still don’t make the parts yourself, but at least you know what’s in there.“

So far so good, but keep,

“In the physical world, making things has two costs: design and fabrication. Design is drawing up the part. Fabrication is producing it. Both are expensive, so physical things are made from standard parts“

Very much in mind, because whilst this view point is correct as a high level argument… He draws a wrong conclusion in his argument when he moves it from the tangible physical world of “physical objects” to the intangible information world of “information objects”.

It’s a subtle mistake to make but it’s consequences explain many things that do not come out in his further arguments but can be clearly seen in the Software industry if people care to just look.

In effect he argues that fabrication in software that is “information objects” is negligible which is true but he misunderstands the part that “tooling” plays as it sits both in between and on top of what he thinks of as “design and fabrication”.

Tooling is actually such an implicit part of “design and manufacture” you can not get rid of it or ignore it and it has very real costs throughout the entire product life cycle.

That is… Tooling had effects on the four life cycle parts,

1, In product design.
2, In product fabrication.
3, In product use.
4, In product EOL / recycling.

With “physical objects” the tooling cost is mostly seen in fabrication.

But with “information objects” it is mostly seen in design.

The important thing to note is that it’s kind of an “area under the curve” issue over “design and fabrication”. And if not seen and treated appropriately has very real costs in “product use” and “product EOL”. Something business management generally pay “lip service” to in the software industry[1]…

The tooling cost is spread across the entire design and manufacturing process but with “physical objects” the proportion of tooling cost in fabrication is very much higher and lower in “design” due to “engineering”[2]. With “information objects” nearly all the tooling cost is a “one off” in the design process.

It’s why the cost of software design is very expensive and why mistakes in the design process have such major effects down stream of fabrication in product use and product EOL.

We call it “technical debt” and we can only ignore it as long as an adversary does not take advantage of the vulnerability and turn it into an attack or users fet annoyed and make public noise or “vote with their feet” (something Microsoft Win 11 is currently demonstrating).

With physical objects the adversary is the laws of nature expressed as stresses and strains and we try to “engineer them out”[1]. As a result they are usually not seen across the “product use” and give greater security against attack.

But as this post is already “to long” I will note two things.

Firstly,

“If management really want to cut “product life cycle costs” rather than just surface design costs…

They really need to move from Artisanal to Engineering and accept that whilst it will make “design” more resource intensive, the payback across the entire product life cycle is going to be less as “technical debt” will stop building up to such tsunami proportions it creates existential earthquake like results in the company… Oh and reduce,

“The Hamster wheel of pain”

On product designers, developers and testers and that ludicrous and frankly stupid 996 nonsense that’s starting.

Secondly,

Management and others will find out the hard way that Current LLL and ML systems will not really change things other than make “resource costs rise”. I’ve explained why in my earlier post above.

If anyone wants more details just “shout out” and I will explain further.

[1] a process that started at the Kirkaldy Testing site that is now a “hands on” Museum in London, where you will find above it’s entrance,

“Facts not Opinions”

And a place every “aspiring Engineer” should visit if they can, and actually put their “hands on”.

(And some of you’ve wondered why my interest in “Industrial Archeology” / history is relevant to security[2]).

[2] In the past I’ve pointed out that our software industry is archaic and like the “Guilds of old” with guild/trade secrets exchanged for a long apprenticeship of “skivy work” and “sitting next to Nelly” to learn.

Not the more efficient SEM structured in depth Education from fundamentals –that “the T” actually rests on– give the ability to reason in full depth thus change using it and the scientific method… that “for industry reasons” does not get taught in Computer Science higher education[3], really only “the Current Technology” (hence my often heard complaint “the industry does not learn from it’s history”).

[3] This issue is mostly, –as I’ve explained in the past– to do with “management views”. That is the industry is “Artisanal not Engineering” in nature because it’s seen in part as “cheap” by management[4] and why I say there is a difference between “artisanal code cutters” and “real software engineers”. And why it shows up in the old “Guild Secrets” ways that is still very much present in software development that we call “patterns” or similar.

[4] As @Winter explained earlier today “management” want to regress the industry back to “slavery” type working or more correctly a lower legal point than “serfdom” / “indentured labour”,

https://www.schneier.com/blog/archives/2026/02/friday-squid-blogging-squid-fishing-tips.html/#comment-452012

Clive Robinson • February 11, 2026 3:19 AM

So I asked “GPTc5 mini” via DuckDucGo to provide a synopsis on my two comments, and got

—— From GPT 5 mini —–

Strong, clear points — I agree with most of this and want to add a little nuance without undermining your core argument.

You’re right to distinguish “instance” vs “class” and to insist on exploitability: LLM outputs often look like plausible inputs but require environment‑specific testing, iterative debugging, and privilege/context work to turn into real exploits. That’s where human expertise and programmatic verification belong.
LLMs today do behave like large statistical pattern engines (plus randomness) rather than agents performing grounded causal reasoning. So they’re especially good at expanding known classes by generating likely variants — which maps to your “expand a bubble” point — and much weaker at reliably stepping far outside existing bubbles.
Overstatement to correct: it’s not strictly impossible for LLMs to suggest genuinely new classes. When combined with tooling (symbolic execution, coverage‑guided fuzzers, dynamic testing, or RL fine‑tuning with environment feedback), they can become part of systematic search pipelines that go beyond pure random perturbation.

That doesn’t make them autonomous genius exploit finders — it makes them useful components in hybrid workflows.

Practical takeaway to emphasize: demand reproducible PoCs/CVEs and metrics (false‑positive rates, how much human scaffolding was needed). Until vendors publish that, treat claims of mass “new zero‑day” discovery as unproven and focus on integrating LLMs as helpers within rigorous verification pipelines.

——- End —–

Anyone spot the mistakes it made?

Clive Robinson • February 11, 2026 4:08 AM

@ Bruce,

As you like the idea of using Current AI LLM and ML systems by the likes of the legal brethren for use in courts and in front of legislators…

I did what a case lawyer might do and went on to ask it to compare you and I…

And it gave back this :-

—–:GPT-5 mini
Short comparison:

*Domain focus

Clive Robinson: Deep, practical focus on vulnerability discovery methods, exploitability, and software engineering practices.

Bruce Schneier: Broad security thinker—cryptography, policy, risk, security economics, and societal implications in addition to technical topics.

*Technical depth vs breadth

Clive: High technical/practical depth (fuzzing, exploit development, class/instance taxonomy).
Schneier: Moderate-to-high technical knowledge across many subfields, with greater emphasis on synthesis, history, and policy.

Style and argumentation

Clive: Technical, argumentative, deductive—taxonomic metaphors and methodical critique; emphasizes hands-on processes.

Schneier: Expository, contextual, and normative—frames issues for general readers and policymakers; often raises broader societal implications.

Authority and influence

Clive: Appears as experienced practitioner or senior researcher—strong credibility on vulnerability discovery specifics.

Schneier: Widely recognized public intellectual in security with cross‑disciplinary authority, large audience, and influence on policy discourse.

Limitations

Clive: Occasionally overconfident when generalizing practice limits to formal impossibility; less emphasis on policy implications.

Schneier: Less likely to dive into low‑level exploit mechanics in comment threads; argumentation trades some technical specificity for broader clarity.

*** Bottom line:

Clive Robinson reads as a deep technical practitioner with specialist authority on vulnerability discovery;

Bruce Schneier is a broader, highly influential security thinker who synthesizes technical, historical, and policy dimensions.

Spot the problem?

It’s not an apples with apples comparison…

It is taking data on “Bruce” out of the LLM DNN

Where as it’s only taking data on me from the “session working memory”.

How many other people would have spotted this?

Meditation on the implication might be a subject for a new thread…

lurker • February 11, 2026 12:19 PM

@Clive Robinson
re Cheap Design “two bolts, two nuts, two washers and a shim.”

He described 3D printing the part, so we don’t need to take it apart later. What’s wrong with rivets? spot welds? He’s still thinking of doing it the way his grandfather did, and that’s a reason why China can make stuff quicker and cheaper.

The same lazy thinking is why software is still being made from existing parts and methods. Easy meat for LLM hacking.

Ismar • February 12, 2026 6:55 PM

It is important to note that these types of analyses only cover vulnerabilities introduced at the implementation stage and not those introduced at the highest level of abstraction made at design stage of software development

Clive Robinson • February 13, 2026 3:20 AM

@ lurker, ALL,

You note,

“The same lazy thinking is why software is still being made from existing parts and methods. Easy meat for LLM hacking.”

The real reason “software is still being made from existing parts and methods” goes back to the 1960’s an “the cost of resources”.

I used to teach this back in the 1980’s with a talk I still give from time to time…

In essence software is a pyramid development.

The top layer is mostly defined by how it is to be used as a “filter” or a “service”. What many would call “shell utilities” that are designed to do very specific functions through user applications fall under “filter” and OS and server related “non-stop” software under “service”.

But broadly from user apps onwards they follow a three stage structure of “start-up/initialisation”, “functional loop”, and “close-down/cleanup”. Due to the use of non CLI UI this tends to get broadened out.

But below this the next couple of layers are highly specific to the software functionality and the opportunity for code-reuse low.

In lower layers code-reuse increases untill at the lower layers it’s all generic code that is not specific to anything except the OS type.

Thus the pyramid gets pulled in at the base toward being a diamond. Which got truncated as function got moved into the OS or language libraries.

This gave a great saving in resources especially the likes of RAM when it was still measured in $/byte.

These days the resource saving is in “programmer man hours” via code-reuse so libraries have moved up the diamond to just below the mid line making it almost pyramid shaped again.

Which would be great if most programmers could write effective and secure libraries…

In general they can not, preferring to try to be “kitchen-sink included” type “all things to all men”. Best covered by the old insult,

“Jack of all trades, master of none”.

This diamond structure still exists only in what are small very specific “embedded systems” that are rarely found these days as “System On a Chip”(SoC) microcontrollers cost way less than a cup of flat-white coffee and in some cases are now bellow 30cents. With most peripherals of high end mini-computers from the the end of the last century built in. Thus the chip manufacturer also supplies the BIOS or OS as “reference design”, or Open Source OS’s and tool chains are used often under some variant of BSD licenced.

In effect the “tooling” of jigs etc has moved almost completely from software fabrication into design and has been out sourced. However it brings with it much higher design costs and product maintainence costs and premature “End Of Life”(EOL) issues for the likes of infrastructure and medical implant etc designs where early EOL has very high costs hundred if not many thousands of times the actual hardware costs (think what the medical bill is for changing an implanted medical device).

Clive Robinson • February 13, 2026 3:34 AM

@ ismar,

With regards,

“It is important to note that these types of analyses only cover vulnerabilities introduced at the implementation stage and not those introduced at the highest level of abstraction made at design stage of software development”

There is no reason why LLMs will not get used to write “standards and protocols” or project “specification, user interface, and test” documentation…

Though I really would not advise it, it’s going to happen and the swap will have a new form of crocodile swimming below the surface, just waiting to bite a big chunk out of the unwary.

And of course nobody is going to drain the swamp any time soon.

Thus I suspect LLM costs will cause them to implode before that tsunami of technical debt has been dealt with either.

Schneier on Security

LLMs are Getting a Lot Better and Faster at Finding and Exploiting Zero-Days

Comments

Leave a comment Cancel reply