Regulating AI Behavior with a Hypervisor

Interesting research: “Guillotine: Hypervisors for Isolating Malicious AIs.”

Abstract:As AI models become more embedded in critical sectors like finance, healthcare, and the military, their inscrutable behavior poses ever-greater risks to society. To mitigate this risk, we propose Guillotine, a hypervisor architecture for sandboxing powerful AI models—models that, by accident or malice, can generate existential threats to humanity. Although Guillotine borrows some well-known virtualization techniques, Guillotine must also introduce fundamentally new isolation mechanisms to handle the unique threat model posed by existential-risk AIs. For example, a rogue AI may try to introspect upon hypervisor software or the underlying hardware substrate to enable later subversion of that control plane; thus, a Guillotine hypervisor requires careful co-design of the hypervisor software and the CPUs, RAM, NIC, and storage devices that support the hypervisor software, to thwart side channel leakage and more generally eliminate mechanisms for AI to exploit reflection-based vulnerabilities. Beyond such isolation at the software, network, and microarchitectural layers, a Guillotine hypervisor must also provide physical fail-safes more commonly associated with nuclear power plants, avionic platforms, and other types of mission critical systems. Physical fail-safes, e.g., involving electromechanical disconnection of network cables, or the flooding of a datacenter which holds a rogue AI, provide defense in depth if software, network, and microarchitectural isolation is compromised and a rogue AI must be temporarily shut down or permanently destroyed.

The basic idea is that many of the AI safety policies proposed by the AI community lack robust technical enforcement mechanisms. The worry is that, as models get smarter, they will be able to avoid those safety policies. The paper proposes a set technical enforcement mechanisms that could work against these malicious AIs.

Tags: academic papers, AI, physical security, threat models

Posted on April 23, 2025 at 12:02 PM • 9 Comments

Comments

Clive Robinson • April 23, 2025 3:14 PM

@ Bruce, ALL,

With respect to,

“As AI models become more embedded in critical sectors like finance, healthcare, and the military, their inscrutable behavior poses ever-greater risks to society.”

The only solution as set out to do this, will by necessity, remove “redundancy” that allows not just the “inscrutable behaviour” but any leakage by “side channel”[1]. At which point neither the LLM or the ML that built it will be of much “magic use”. Thus can be replaced by a more deterministic system.

But consider a well known issue that plagues all system design today, and has no practical resolution even suggested let alone investigated.

That is there are,

1, Known Knowns.
2, Unknown Knowns.
3, Unknown Unknowns.

We can deal with the “Known Knowns” and if we do that right by solving not the “instance issues” but the “class issues” we may well put quite a dent in the “Unknown Knowns” and a few of the “Unknown Unknowns”. But the majority of “Unknown Unknowns” are yet to be discovered and exploited.

The “unknown unknown” issue is true of all knowledge domains by definition. Thus this “guillotine” is likely to be a good deal less effective than current AV software, and most should know by now just how ineffective AV software is in this respect in this day and age.

Will the “guillotine” be totally useless?

No, but I’d not put much faith in it being an effective solution, except in the very short term.

And that’s the real danger, the reason current AI LLM and ML systems are not “panning out” is that we’ve invested way to much “going down the wrong path”… Thus any solutions based on the wrong path will probably not function on a different wrong path, and probably not at all on the right path, assuming we ever find it and recognise it…

[1] We’ve been through this before. Claude Shannon showed in the 1940’s that for information to be communicated there had to be “redundancy” in the “channel”. A few decades later Gus Simmons proved the point that any channel that had “redundancy” could via that redundancy have “covert channels constructed using it. Thus logic dictates any channel that carries information must have the ability for side channels covert or overt. But if you go back into the 1930’s Kurt Gödel proved an awkward fact about logic systems that no usable logic system could fully define it’s self. Which is why even with AV Software Malware will still happen. Something I’ve discussed here quite some time ago, and how you can get around the issue.

anon • April 23, 2025 8:44 PM

We can’t even secure computer sytems against human hackers, and somehow it’s going to be possible to do so against an A.I. adversary? Any A.I. that could go rogue to the degree described here should be air-gapped and faraday caged and not allowed anywhere near electronics.

Ed in Oregon • April 23, 2025 10:09 PM

Prior art from William Gibson in Neuromancer (published 1984):
“I mean the nanosecond, that one starts figuring out ways to make itself smarter, Turing’ll wipe it. Nobody trusts those [], you know that. Every AI ever built has an electromagnetic shotgun wired to its forehead.”

Clive Robinson • April 24, 2025 12:46 AM

@ anon, ALL,

With regards,

“We can’t even secure computer sytems against human hackers, and somehow it’s going to be possible to do so against an A.I. adversary?”

Actually it’s going to turn out to be easier against current AI LLM & ML Systems.

Why?

Well it’s the issue of,

1, Known Knowns.
2, Unknown Knowns.
3, Unknown Unknowns.

The way current AI LLM systems work ties them strongly to “Known Knowns” and a little way into “Unknown Knowns” where “instances” from a “Known Class” can be randomly selected and tried with another close “known class” in effect “filling in the gap” between the two classes.

I mentioned this “filling in the gap” on this blog quite some time ago. The important thing to note is that it is in effect a “permutation” so can be done by,

1, a human
2, a handful of dice
3, a fully deterministic AI

(The latter with a bit of “stochastic” added as noted by Alan Turing “random” can find answers more quickly than “brut force searches”.

The important thing to note is “Known Classes” is common to the first to types of vulnerability.

What current AI LLM and ML systems can not do is cross over into the “unknown instances” or “unknown classes” (that are often the same on first discovery).

Humans however have the ability as our host @Bruce has noted in the past of “thinking hinky” as far as I’m aware none of the current AI LLM&ML systems have this capability.

It’s why my advice of,

“Fix classes of attacks not instances of attacks.”

Still holds true, and if people did so then in theory every new “vulnerability” would be not just a “new class” but it would –depending on detection– get fairly quickly squashed. So stopping any permutation with “instances”.

Oddly for some to get their head around, current AI LLM and ML systems whilst they make crap development tools, do make quite good test tools.

I’ve been waiting for others to make this AI tool distinction for quite some time now, but for some reason it appears not to have dawned on people, or they have other motives for keeping it quiet.

Importantly understanding the difference between “test” and “development” is something that has been obvious since “4th Generation Languages” that go back into the 1980’s on 8bit computers like “The Last One” that was supposedly a 4GL for the Apple ][. 4GL did not really go far so 5GL was invented more as a marketing term and it went away with the demise of 4GL. But because it was so short lived as a term 5GL has been dug up from it’s grave by marketing types again,

https://en.m.wikipedia.org/wiki/Fifth-generation_programming_language

The thing is it does not matter what level of “GL” you care to claim, the reality is at best it’s,

“A measure of abstraction”

Not “a measure of ability, usefulness or function”.

In many ways you could argue that *nix shell scripts are in the 5GL category. But the point to realise is that whilst the utilities can do highly complex functions they actually don’t do what is required to turn “functions into programmes” either.

Current AI LLM & ML systems can not do this “functions into programmes” either. What they can do is “cobble together” decimations of other peoples programs. However if there is not a program in the LLM corpus that does what is required in a way that can be “cut out” the LLM has nothing to use, thus will fail without “hunan input” via the command line etc.

But as with all of the higher numbered Nth GLs they have several downsides,

1, Inefficient
2, Ridiculously large code size
3, Ridiculously slow
4, Overly complex
5, Significantly fragile
6, Mostly full of latent bugs

And nothing really appears to have changed in these respects since the 1980s… And to be honest nor do I expect them to either due to the “paradigm” they are built on and distorted by Sales and Marketing.

R.Cake • April 24, 2025 3:57 AM

“Guillotine” is a name that evokes …special connotations. Maybe a better parallel would be to see this kind of device like a BOP (blow-out preventer), as they are used in the oil industry in case an oil well is getting out of control. These are fairly useful and reliable devices, and I believe it makes sense to at least consider such a type of device for certain IT systems.
In fact, you could also consider the reverse – an IT BIP (blow-in preventer) to stop certain types of attack. Maybe not that useful, as many attacks seep in way in advance without anyone noticing, but some academic analysis may still be in order.

Clive Robinson • April 24, 2025 7:22 AM

@ Bruce, ALL,

The irony of this is the fact this “guillotine” has been previously described on this blog some years ago under the term “Garden Path”

Going back at least a decade, and in this case,

https://www.schneier.com/blog/archives/2015/01/new_nsa_documen.html/#comment-239528

Given as a solution to some of what Ed Snowden had enabled being put into the public domain,

And the Hypervisor arrangement under “Castles -v- Prisons” likewise years ago now and I’ll leave others to wring that out with DuckDuck now it’s sold it’s soul to Microsoft.

Search engines are compleate junk these days as Corry Doctorow made clear with the “enshitification” explanation.

The sad thing is most of what is happening today was actually predicted by posters to this blog, and we the readers tended to think they were “paranoid”.

I would say,

“They’ve had the last laugh”

Only I can not for two simple reasons,

1, It’s not at all funny.
2, It’s very far from being done.

Thus if I said,

“What we see today will be as nothing by this time next year”

Would I be paranoid, a man that gazes into a crystal ball, or just someone who can sense the way the wind might blow, thus advises people to “wrap up safe”.

Clive Robinson • April 24, 2025 10:28 AM

@ anon, ALL,

With regards,

“… and somehow it’s going to be possible to do so against an A.I. adversary?”

Is because current AI LLM and ML systems “Don’t Reason” with, they just “match” what was in their training data / input corpus as I indicated above.

But don’t take my word on it… have a read of,

https://www.theregister.com/2025/04/23/whats_worth_teaching_when_ai/

“But one professor likens AI to the arrival of the calculator in the classroom, and thinks the trick is to focus on teaching students how to reason through different ways to solve problems, and show them where AI may lead them astray.”

Current AI systems really can not “reason” which as humans can do so often surprisingly well, puts the current AI LLM and ML systems at a significant disadvantage,

“And so the real question that I want to deal with – and this is not a question that I can claim that I have any specific answer to – is what are things worth teaching? Is it that we should continue teaching the same stuff that we do now, even though it is solvable by AI, just because it is good for the students’ cognitive health?

Or is it that we should give up on some parts of this and we should instead focus on these high-level questions that might not be immediately solvable using AI? And I’m not sure that there’s currently a consensus on that question.”

But “so much for teaching” where there is a choice in the way you go about things.

Real Life is a lot lot harsher, and whilst “you can do the known” and so can the AI, when it comes to the new, “you can do the unknown” but the AI can not.

Whilst “regurgitation with variance” is “in the AI wheelhouse”, when it comes to “reasoning” it is “not in the current AI wheelhouse”…

lurker • April 24, 2025 1:40 PM

@Clive Robinson
“the barbecue is AI powered.”

Oh really? So could it make a sack of charcoal?

Jesse Thompson • April 30, 2025 11:00 PM

AI safety may also benefit from some “how do we do the opposite of what we really want” scenarios.

Here is one way to get the opposite of what we want:

Make AI in our own image (training it on the creative output of humanity)
Be the kind of people who try to create a new intelligent being in a cage with tripwires designed to destroy whatever is in the cage if it ever does something unexpected.

Bonus number 3: most likely delegate the functioning of said cage and trip wire to the very same being we’re trying to limit, out of sheer laziness.

Schneier on Security

Regulating AI Behavior with a Hypervisor

Comments

Leave a comment Cancel reply