OpenAI's GPT-5.5 is as Good as Mythos at Finding Security Vulnerabilities

OpenAI’s GPT-5.5 is as Good as Mythos at Finding Security Vulnerabilities

The UK’s AI Security Institute evaluated GPT-5.5’s ability to find security vulnerabilities, and found that it is comparable to Claude Mythos. Note that the OpenAI model is generally available.

Here is the Institute’s evaluation of Mythos.

And here is an analysis of a smaller, cheaper model. It requires more scaffolding from the prompter, but it is also just as good.

Tags: AI, cybersecurity, vulnerabilities

Posted on May 13, 2026 at 7:03 AM • 10 Comments

Comments

Rontea • May 13, 2026 10:58 AM

We’ve long worried about the democratization of offense. Models like GPT-5.5 lower the barrier to entry for complex cyber operations. Even if public deployments have strong guardrails, the existence of a single universal jailbreak is a reminder that automated safeguards are brittle against determined adversaries.

Morley • May 13, 2026 11:02 AM

Alternatively, GPT-5.5 is as bad as Mythos at finding security vulnerabilities.

bye bye ai • May 13, 2026 11:11 AM

@Morley

Alternatively, the average user is screwed either way because they become dependent on the technical prowess of the tribe they belong too. Or as @clive put it the other day: it’s our own damn fault for being born plebians and striving with the artisans.

bird turd • May 13, 2026 12:36 PM

Your masters, your owners have root.

End of story.

You’re welcome for using modern technology.

lurker • May 13, 2026 2:47 PM

@Bruce

The third link you give, to aisle.com returns “500 SOMETHING WENT WRONG” if browser cookies or javascript are turned off. Nice start.

Clive Robinson • May 13, 2026 2:48 PM

@ Bruce, ALL,

People should think about the “curve” not the “hight” of the line on the graph.

For years I’ve mentioned that CCTV is mostly a waste of time because it’s “Static Defense” so an attacker can easily out evolve it and they often do [1].

Now consider the Current LLM and ML Systems, as an overly simple approximation the LLM “pattern matches” to what the “ML learned from the input data”. That means there is a capability window that opens after the ML is run, but only gets “partially closed” the next time the ML process is run with sufficient new data.

Thus there are two issues,

1, The cost of running the ML system (which is inordinately high).
2, The cost of gathering “new unseen data” (and collating it correctly to avoid bias etc).

But the second issue is where it all starts to go wrong…

When new attacks are found by humans it can be either by “pattern matching” to existing attack instances and classes or “reasoning it out” that is coming up with a new class of attack.

LLM systems can only pattern match they can not reason and they don’t really learn from their own previous actions.

Thus we can predict,

1, The LLM systems will initially be successful against “known known” attacks.
2, They will also have success against some “unknown known” attacks (just as existing inordinately expensive stochastic and similar systems do).
3, These initial successes will drop of significantly as the found instances are removed from existing code bases.

Then what?

Well without human intervention the LLM systems become static and their success rate will drop to near zero.

But… The number of humans “reasoning out” new classes of attack and thus allowing new instances to be created/found is very much dependent on their “journey man” experience of “learning the trade” or more correctly “learning to think hinky”.

We know from the way the ICT industry works, that management will cut back on manpower where ever it can. This could and probably will mean very few people get the “journeyman experience required to “think hinky” thus reason out new classes of attack.

Which means no new data for the ML process thus the LLMs effectively stagnate.

There is a reason why tools are used by humans and don’t have agency, it’s because they don’t learn and don’t reason. The reason they also don’t have agency is that tools do not have any kind of “world view”. Without this the best they can become is,

“Force multipliers under the guidance of a directing mind.”

Which has also given rise to societal problems as “directing minds” can be “seen as good or bad to an independent observer”. And this is mostly down to the mores of society seen through a politically –with a small p– inspired point of view.

It’s why many say or agree with,

“Technology alone can not solve social issues.”

Something we will see increasingly with the GPT type “pattern matching” LLM systems.

[1] Without going into it to deeply, you have a problem such as street crime, and for various reasons it is decided that the level of street crime is too high (even though it’s probably dropping). At great expense the “technological fix” of CCTV is installed and street crime where it is installed drops measurably and arrest rates go up. Then the arrest rate drops and for a little while longer the crime rate remains at the low rate. This is because the smarter criminals have not stopped they have simply moved elsewhere where there is no CCTV. the high arrest rate was of the stupid and the unlucky. The stupid can be divided into two groups, actual criminals effectively mugging people etc, and the idiots who get drunk and vandalize etc. Either way the stupid and the unlucky get deterred in some way such as being in jail for a while. Then the street crime effectively starts to rise again and will do as long as the “reward” is there, the smarter criminals simply work out how to “out smart the static defence” the CCTV is. This evolutionary curve pops up all over the place and is also why,

“Generals who won the last battle oft loose the next when they use the same tactics.”

Zsolt • May 13, 2026 6:12 PM

Daniel Stenberg (original author of cURL) wrote about the results of a Mythos vulnerability analysis of libcurl just the other day. The LLM has found only a single new vulnerability and it also “found” 3 false positives and one issue that was classified by the cURL security team as a bug (and not a vulnerability).

Since LLMs have been used for code analysis of the cURL project for several months, this means that Mythos is just a single identified vulnerability ahead of its competition. At least as far as cURL is concerned, which is of course not your average open-source project (including security aspects).

That’s not a major jump in LLM capabilities, more like a minor, incremental step.

ismar • May 13, 2026 7:27 PM

@ Zsolt – knowing one more vulnerability over your adversary is all you need so comparing on the numbers only makes little sense

Clive Robinson • May 14, 2026 8:37 AM

@ ismar, Zsolt,

With regards your comment of,

“knowing one more vulnerability over your adversary is all you need so comparing on the numbers only makes little sense”

You in effect make two observations.

1, The first being even one step ahead gives an attacker advantage.

2, The second being that the actual number of alleged vulnerabilities make no sense.

As statements they are both true and false at the same time. Or to put it a better way they depend on other factors that need to be considered as well.

So look on this not as a critique but a drill down.

For instance depending on the tool chain a vulnerability may be present in high level source code, but it may not exist in the runtime code, or be reachable in the runtime code.

But that is not the most problematic issue, that of “numbers” is…

ICT Industry Management have a very very poor record when it comes to resource allocation issues. I mention it as a part of it towards the end of my above post,

“We know from the way the ICT industry works, that management will cut back on manpower where ever it can. This could and probably will mean very few people…”

Management will “cut back on manpower” at every opportunity they can grasp on the excuse of “Shareholder value” or similar and thereby kill the future of their business…

It’s as I’ve mentioned repeatedly in the past on this blog due to,

1, Very short sighted thinking.
2, Neo-Con mantra faux-reasoning.

I can show that the first is the “curse of greed” aligned with a failure to “understand probability”.

The second is a consequence of the same but used to excuse it…

And it can only happen by entirely ignoring the lessons “nature has given” by billions of years of “evolution”. Evolution absolutely disregards both “short sighted thinking” and “faux-reasoning” very very brutally (think all those “extinction of species” etc events).

When you look at “management thinking” thus strategy, you quickly find that it is anti-evolution at it’s core…

This “all in on AI” is a prime example of such ignorance in management.

Some of us know from long experience that what is in reality “Digital Signal Processing”(DSP) algorithms can not give “reasoning” and thus at best “filter and follow” past trends poorly. And is the same reasoning financial advice is required to have that disclaimer of

“Past performance is not a predictor of future performance”

This is the trap of what the Current AI LLM and ML systems mainly do,

“filter and follow the past”

And also throw in some randomness to give deviation from the “over generalized past curve”. It’s why they got called “Stochastic Parrots” by those who have also had sufficient experience to understand what Current AI LLM and ML systems are about.

Reduced down, Current AI LLM and ML systems are just “gambling” devices like dice in a game[1]…

Thus the “Hype Claims” come from when random has gone in a way seen retrospectively as “favourable” and ignores all the “unfavourable” times. Thus is a good example of “Cherry Picking” for fraudulent gain[2] by what some call “Con Artists”.

In fact we can already see this with excuses about “Hallucinations” and “not enough input data”…

But step back several hundred years we can see an example of this “use random”. A composer[1] came up with a system to compose minuets by rolling dice. It made “acceptable background noise” to delight a few but mainly to cover the hum of conversation not make “music to remember”.

This “create acceptable background noise” is all most LLMs actually do or can at best fail to do. Fairly soon most businesses pushing hard into AI are going to find this out the hard way…

The less obvious danger is the results of “promotion of success by hiding failure”[2] it allows the Current AI “Be Business Plan” by the likes of Microsoft, Meta, Google and Apple I’ve mentioned before of

“Bedazzle, Beguile, Bewitch, Befriend and Betray”

Back a while ago our host @Bruce said a few words to the “critters” up on the hill… Those words are worth listening to,

https://www.youtube.com/live/wKkk-uWi7HM

(Go to 1 hour, 14 mins and 30 secs)

Which gave rise to me making the “BE Plan” point more clearly than I had on this blog upto then,

https://www.schneier.com/blog/archives/2025/06/hearing-on-the-federal-government-and-ai.html/#comment-445807

But read the rest of the comments on that thread as well.

The thing about the plan is, that AI as useless as it actually is, in most cases is to be irrevocably forced on you, for surveillance purposes.

Where amongst other things you will have AI forced onto your computer which will also force you to connect to their “cloud”. Businesses and Government will force you to do everything by computer thus providing backdoor access to it all.

Which means that the AI will perform “Client Side Scanning” and like a malevolent ET send it back to the “Mother ship”. But worse as we all should know, not only is it a “back door” for them, it’s also a “back door” for everyone else… That is those that can “gain access” to your computer or any computer your personal or business data ends up on. Be it by “physical” access or remotely by the “energy” that allows the computer to function and communicate.

[1] One such game is “Musikalisches Würfelspiel”. Sadly in many minds such constructive games have been falsely attributed to “the genius of” (in this case Mozart). Which thus allows them to think of it as special or even magical when on closer inspection it clearly is nothing of the sort. LLMs are currently being treated as the new “Musikalisches Würfelspiel” and I fully expect them to follow the same or similar historical path.

[2] In essence this is what all those “guru books” and “Secret of my success conferences” are all about. At best they “cherry pick” examples to impress, but they have your money by the time you find this is actually a con game. If you ever do… because they can be seen as a measure of the gullibility of humans. Simply by the fact that they fill bookshop shelves and seminar centers year after year to the benefit of what are the modern equivalent of “snake oil salesmen” over and over, so proving the old maxim of,

“A fool and his money are soon parted.”

Weather • May 14, 2026 10:55 AM

@Clive

It is ipv4 vs ipv6, with ipv6 google etc will sis to work, thats why its Nat network on Nat. 2010 was meant to be the rollout but if you can’t can’t scan the range, some people sit to lose, thats why this Ai thing poped up.

Schneier on Security

OpenAI’s GPT-5.5 is as Good as Mythos at Finding Security Vulnerabilities

Comments

Leave a comment Cancel reply