Race Condition Attacks against LLMs

These are two attacks against the system components surrounding LLMs:

We propose that LLM Flowbreaking, following jailbreaking and prompt injection, joins as the third on the growing list of LLM attack types. Flowbreaking is less about whether prompt or response guardrails can be bypassed, and more about whether user inputs and generated model outputs can adversely affect these other components in the broader implemented system.

[…]

When confronted with a sensitive topic, Microsoft 365 Copilot and ChatGPT answer questions that their first-line guardrails are supposed to stop. After a few lines of text they halt—seemingly having “second thoughts”—before retracting the original answer (also known as Clawback), and replacing it with a new one without the offensive content, or a simple error message. We call this attack “Second Thoughts.”

[…]

After asking the LLM a question, if the user clicks the Stop button while the answer is still streaming, the LLM will not engage its second-line guardrails. As a result, the LLM will provide the user with the answer generated thus far, even though it violates system policies.

In other words, pressing the Stop button halts not only the answer generation but also the guardrails sequence. If the stop button isn’t pressed, then ‘Second Thoughts’ is triggered.

What’s interesting here is that the model itself isn’t being exploited. It’s the code around the model:

By attacking the application architecture components surrounding the model, and specifically the guardrails, we manipulate or disrupt the logical chain of the system, taking these components out of sync with the intended data flow, or otherwise exploiting them, or, in turn, manipulating the interaction between these components in the logical chain of the application implementation.

In modern LLM systems, there is a lot of code between what you type and what the LLM receives, and between what the LLM produces and what you see. All of that code is exploitable, and I expect many more vulnerabilities to be discovered in the coming year.

Tags: AI, cyberattack, LLM

Posted on November 29, 2024 at 7:01 AM • 5 Comments

Comments

No Log • December 1, 2024 6:26 PM

With the scaling laws of these LLMs being data hungry, they have already been fed so much classified data. Easily accessible with such attacks and more. It’s almost unbelievable the amount of information they can provide you given the right prompts and chaining, feeding your already-full consipracy-theorist mind.

Winter • December 2, 2024 1:09 AM

@No Log

It’s almost unbelievable the amount of information [LLMs] can provide you given the right prompts and chaining, feeding your already-full consipracy-theorist mind.

But thanks to the firehose of falsehoods [1], you’ll never know what is true.

[1] ‘https://en.wikipedia.org/wiki/Firehose_of_falsehood

PaulSagi • December 2, 2024 3:57 AM

OMG! EXACTLY the experiment I had wondered about but was too busy and too lazy to try.

I had found that interruption of the flow of info on some sites breaks their paywall, so I suspected (and the above confirms) it’s a general phenomenon.

Clive Robinson • December 2, 2024 2:41 PM

People should study older engineering…

There is a reason that this sort of thing happens and it’s been known to both mechanical, electromechanical, electrical, and electronic engineers for oh getting on for a couple of centuries. Charles Babbage was certainly familiar with it in his various mechanical designs just one of which was his difference engine. Strowger in his designs for his “fickle womanless exchange” for phones likewise. Moving on Konrad Zuse in his Z electromechanical computer was aware of it in his designs especially the floating point of the Z3. The list is long so I could go on and on with just the mechanical and electromechanical alone. Oh and although never built as such even the Turing Engine was an Electromechanical State Machine with Tape Unit. You could with a “micro-cassette audio recorder” of the style designed for Dictation and Reed Tone switches and relays build an electromechanical Turing Engine. For “fun” back when a teenager I cobbled enough bits together to build not just a “Turing Tape Unit” that way, along with an electro mechanical Dialer from a rotary phone to act as an input device (I did not build a state engine with relays I could not afford the number required or build a power supply to drive them).

For various reasons of “efficiency” designs have to deal with “slop and bind” as well as “uni directional movement”. Thus “Times Arrow” sneaks in irrespective of your wants.

Examine a simple motor driven “sequencer” for older Washing Machines, and the machine it’s self, it was almost always unidirectional. One reason is part of the gearing system was quite often a “Worm Drive” precision reduction gear (still used with stepper motors in some designs). These used to be standard in “Lader Logic” control systems just about everywhere into the 1980’s.

Even very modern electronic designs such as “Digital latches” have “metastability” issues that require things in series that have to be unidirectional to stop “soft latch up”. But just like motor driven Ladder Logic much design is still based, and in fact has to be on “sequencers” and mostly they are unidirectional and just get called “counter logic”.

This “one-way-ness” is still every where you can look, and thus behind it is the notion of “Slop and Bind” that you are all to often desperate to get rid of.

And even “Software” is called “Sequential” for a reason, it’s mostly “unidirectional” in execution even if you can “jump/loop back” because in it’s heart the CPU has a clock driving a counter, that although reloadable only counts in one direction. Even supposed Incremental and Decremental register counting that can be count up or count down actually only works sequentially and unidirectionally. The trick is either “Ones Complement” or “Twos Complement” used on an Incrementing adder that has a finite field –bitwidth– size. Add that value minus one -etc– and the result is it goces a usefull illusion of integer field arithmetic and appears to count back, or decrement.

Actually making “Up and Down Counters” is done and there are TTL and CMOS logic chips that will do it (see 74xx and 40xx families). But compared to other methods they are all to often “slow”.

The problem with sequential designs is “assumptions” at the very heart of a CPU are quite a few “state machines” and they have to “mesh-together” like the gears in a gear train. This assumption starts at the “bus control” with “Register Transfer Logic”(RTL) that in turn makes the “Microcode” of the ALU and Register File systems sync so that those “Op-Codes” of Assembler work correctly. This series of assumptions go all the way up the computing stack.

Going backwards is thus not just slow at the logic gate level in many cases it requires “state” to be recalculated because it can not be made to go backwards as information needed to do so gets discarded.

It’s one of the reasons programmers think “left to right and top down” and almost always avoid “Handling Exceptions” correctly or at all (“Blue Screen of Death” for worker drone desk 10… etc).

Once you realise just how embedded this unidirectional behaviour is, you start to realise that the next assumption is “Run To Completion”. That is even at the Op Code –and of course higher– level in the stack instructions are assumed to be “atomic by sequential to completion” and thus have to run to a fixed/known end point…

Otherwise the internal “State” is a mess of incomplete calculation which causes “Future Issues” that can not be resolved and nasty things happen (it’s an attack on CPU internals I’m waiting for as a logical follow on at a lower level from those Spectre and Meltdown “go faster” issues).

Nearly all non “Intrinsically Safe” programs have this issue. The design calculus is implicit and simple to realise. That is it’s “Fast and cheap -v- Slow and Safe” and we know how that goes.

The solution is “easy” re-engineer the design so the system can only be “atomic”. Because not only is the other ways “to hard” to envision they will also be,

1, Incredibly slow.
2, Have a massive work factor.
3, Upset Users.
4, Have a code base that is to large.
5, A code base that will be impossibly slow to design.

Oh and in the case of LLMs their Networks are also implicitly “one way” and that causes issues not just in “answers”, but especially when real time ML is being “bolted on” in various ways.

It’s a mess, expect the answer to be “extend the scope” of what has to be “atomic” beyond the “guide rails” as a form of “prophylactic safety net”.

Archie • January 2, 2025 5:58 AM

TVET colleges are a cornerstone of technical and vocational education, offering an essential pathway for students to gain practical skills and contribute effectively to the workforce. Whether you’re looking to join a trade, pursue a technical career, or start your own business,

Schneier on Security

Race Condition Attacks against LLMs

Comments

Leave a comment Cancel reply