LLMs are Getting a Lot Better and Faster at Finding and Exploiting Zero-Days

This is amazing:

Opus 4.6 is notably better at finding high-severity vulnerabilities than previous models and a sign of how quickly things are moving. Security teams have been automating vulnerability discovery for years, investing heavily in fuzzing infrastructure and custom harnesses to find bugs at scale. But what stood out in early testing is how quickly Opus 4.6 found vulnerabilities out of the box without task-specific tooling, custom scaffolding, or specialized prompting. Even more interesting is how it found them. Fuzzers work by throwing massive amounts of random inputs at code to see what breaks. Opus 4.6 reads and reasons about code the way a human researcher would­—looking at past fixes to find similar bugs that weren’t addressed, spotting patterns that tend to cause problems, or understanding a piece of logic well enough to know exactly what input would break it. When we pointed Opus 4.6 at some of the most well-tested codebases (projects that have had fuzzers running against them for years, accumulating millions of hours of CPU time), Opus 4.6 found high-severity vulnerabilities, some that had gone undetected for decades.

The details of how Claude Opus 4.6 found these zero-days is the interesting part—read the whole blog post.

News article.

Posted on February 9, 2026 at 7:04 AM2 Comments

Comments

Javier Kohen February 9, 2026 7:34 AM

When I first used AI for a code refactoring about a year ago it surprised me by rewriting seemingly unrelated portions of my handwritten code. I discovered later that it was fixing bugs in the business logic and error handling.

One traditional problem with AI has been that it is hard to explain its conclusions. As long as there has to be a human in the loop, and there certainly needs to be one on most cases still, we need to be partners with the AI. I’ve noticed the industry has been paying attention by making the models explain their thinking, which apparently both improves the outcome and our ability to interact with the AI.

Now that reminds me of a finding, I think by anthropic, where a model explained that it performed a basic math calculation using the elementary school method, but the researchers were able to look under the hood and confirm it was actually using a combination methods, including heuristics. Not unlike how the actual human brain would do small number math in the real world. So AI could easily become that lazy co-worker who never wants to explain their real motivations to rewrite your code.

K.S February 9, 2026 7:47 AM

In the optimistic case for LLM assistants in coding, it will make all but senior coder positions obsolete by drastically increasing productivity. As the result, we will get better code faster and it will be reviewed/signed off on by competent people.

In the pessimistic case, it LLMs will churn code much faster and cheaper than any coder, so a lot of people in software development and QA will lose jobs. The resulting code will be different – LLMs will make new categories of security and optimization mistakes that after a short transition honeymoon period will result in exponential explosion of vulnerable systems.

Leave a comment

Blog moderation policy

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.