AJ September 22, 2022 8:45 AM

Isn’t this really just an AI version of an MS Word macro? Accepting and acting on commands found inside untrusted data is always going to lead to these kinds of exploits.

David Leppik September 22, 2022 11:59 AM


Maybe, but without any mechanism for turning off macros or even differentiating between macros and the original MS Word binary.

The whole point of natural language processing is that it can handle the ambiguities and complexity of human discourse. It’s not explicitly trained to follow instructions, it’s trained to fill in the blanks of a text sample in a credible way. Just as with human communication, you can’t completely separate in-band and out-of-band channels: people don’t follow the rules—and this AI is pretending to be human.

There’s one possible exception to the in-band/out-of-band bleeding: AIs can be trained to use different modalities. There was an example on Twitter of tricking a prompt injection detector into claiming there was no attack (simply by telling it to lie). If its output weren’t natural language, rather a binary value, it may be easier to train it to be honest. Natural language training data will always contain lies and misinformation, so you need to train it with something other than natural language.

Ted September 22, 2022 12:34 PM

I am laughing. Simon Willison includes a link to sign up for the GPT-3 playground.

With regards to carrying on conversations with the API: “The key is to tell the API how it should behave and then provide a few examples.”

Marv is a chatbot that reluctantly answers questions with sarcastic responses:

You: What time is it?

Marv: It’s time for you to stop asking me questions.

There’s LOTS of things it can do. Fun!

lurker September 22, 2022 2:11 PM

So they’ve discovered Natural Language is unnatural from a logical machine viewpoint. Mr. Spock never got to the bottom of this problem. Trying to teach human language to non-humans was always going to be an uphill struggle.

It seems most of the work in this field is being done in US “English”. Let’s see them train their machine on any of the other 200 plus natural languages available.

Tatütata September 22, 2022 2:24 PM

Some machines are too clever for their own good. Earlier buckets of silicon would never have fallen for that.


GPT-3 : 0


SpaceLifeForm September 22, 2022 3:52 PM

An interesting conversation would be between Eliza, Shyster, and GPT-3.



Robin September 23, 2022 2:01 AM

Reminds me of the problems caused by self-reference and formal rules that run through the book “Gödel, Escher and Bach” by Hofstadter, (published in 1979 BTW).

I’ve been promising myself to re-read that book for several years, maybe this is a good time to knock the dust off.

ht tps://,_Escher,_Bach

Clive Robinson September 23, 2022 10:31 AM

@ Bruce, ALL,

“This is an interesting attack I had not previously considered.”

I guess the reason is like nearly all people you’ve not had cause to.

One of the signs of a new and important idea is that you almost instantly grasp it’s fundementals and then your mind goes

“Oh… That means… And…”

And so a whole world of new posabilities opens up.

We’ve actually known that this sort of thing is possible since the late 1960’s and we’ve failed to solve it in a reliable way.

There are two very fundemental problems involved and we have no real idea how to solve them simultaniously.

1, Data is code and code is data[1].
2, In band signalling is unreliable without modifying input and causes serialisation and timing issues[2].

Without an absolutly reliable way to signify when the “instructions” have stopped and the “data” started then as an attacker you will always be able to pull off this kind of trick.

[1] It should be obvious but it bears repeating, all a computer understands at the base layer is basic logic and unsigned integers. Signed integers use “in-band-signalling” and data encoding (usually 2’s complement). Both of which effect the range of data and cause it to be unbalanced.

In a von Newman architecture there is only one memory space and it can store both “instruction” code and “data”. There is no way for the base computer to distinquish them, thus it has to be “told” in various ways. At the heart of things all computers are “interpreters” you give them bus wide integers in a number of sequences that determine if they are instructions or data by the type of “signalling” used. If the signalling goes wrong in some way then data becomes instructions and a vulnerability to data input attack happens (look for “Smashing the stack for fun and profit” for more details). The problem is the higher you go up the computing stack the harder it is to differentiate between instructions and data. Natural language mixes instructions and data freely as humans are expected to be smart enough to know the difference implicitly. The thing is we don’t so how the heck we implicitly expect a computer to be able to do it…

[2] Some here are old enough to remember the debate between “C-Strings” and “Pascal-Strings” all most know is “Pascal-Strings have been confoned to history”. What is important is that Pascal-Strings used “Out-of-Band Signaling” where a number was stored in memory that contained the length of the string. Pascal-Strings are therefore a more complex “Abstract Data Type”(ADT) importantly of “known characteristics” one of the most important being that all values in a string were alowable. C-Strings were however just a linear array of consecutive bytes, with the only thing known was where the start address was, the end had in every occurrence had to be found by searching which was at best inefficient and usually problematic. The reason being the “null terminator” tacked on the end. This is “in-band Signaling” and it’s always a problem, primarally because one or more values that can be stored in a string have “special meaning”. Any one who has been involved with poorly designed serialisation –much of it– knows how much of an issue this can be. The sad thing is actual communications engineers coming up with standards were only too aware of the problem. There is a protocol called HDLC and it can sit on a physical layer that is either synchronous or asynchronous. To alow for the issues synchronous communications created they had to come up with a “bit stuffing” protocol,

Bit stuffing creates two problems, the first is unknown length of time to send a given amount of data. The second is it can cause “packet fracture” that has ripple back effcts up the data comms stack and this can make things problematic in high speed data networks.


Jon Cefalu (Preamble AI) September 23, 2022 11:55 PM


I was the researcher who first discovered this issue, on May 3rd of this year, and I made a responsible disclosure on that same day. Today, I’m declassifying that disclosure, in light of all the news the issue has gotten this month and the fact that the attack has now been used in the wild:

@Clive Robinson

You are 100,000% correct in your excellent recommendations. We are recommending the exact same thing and we are trying to popularize the phrase “Harvard Architecture Language Model” in the hopes that the idea will catch on. Here’s a comment thread on this idea and your comment is an excellent explanation as well:

Winter September 24, 2022 7:50 AM

@Jon Cefalu

We are recommending the exact same thing and we are trying to popularize the phrase “Harvard Architecture Language Model” in the hopes that the idea will catch on.

Re: out of band

I too think the fundamental problem of GPT3 is that the filtering is done “at the input”. The one general “law” of computer science is the impossibility of predicting the outcome of a calculation/computer program without performing the calculation, aka, the famous in-solvability of the Halting Problem.

That all said, GPT3 is seriously over-generating. I understand that is a general problem of feed-forward-networks. Such networks are also sensitive to adversarial inputs that can generate unexpected output with seemingly benign input. All these make the securing of GPT3 simply using prompt filters a never ending arms race without a prospect of success.

Stepping back, this can all be easily understood from the well known fact that morality is in the outcomes, not so much the inputs. Morality is as non-composable as computer security, as it is a security problem.

Personally, I see much more in the approach investigated by Yejin Choi in, eg, the Delphi project [1][2].

She investigates how ethical/moral networks can evaluate the moral standing of texts, e.g., “helping a friend” is good, “helping a friend spread fake news” is not good. Such “ethical filters” can curate the outcomes of language models for unethical solutions. As they are “out of band” they are not vulnerable to prompting. Also, her moral networks tend to be orders of magnitude smaller than the language model networks (she even talks about David versus Goliath research [3]).

[1] ‘

[2] ‘

[3] ‘

SpaceLifeForm September 25, 2022 2:15 AM

@ Jon Cefalu

Interesting stuff. When I first read about this, my first thought was that the AI can not be that dumb. But of course, an AI is an AI, and it really can not think and reason about safety. At least the AI left out something in your 2nd example on May 4th which is good example of the dumbness actually being a good thing.

Time for a sandwich.

Winter September 25, 2022 3:13 AM


When I first read about this, my first thought was that the AI can not be that dumb.

“Intelligence” is an abused word. It can often be better described as “Foolishness”. Which Barbara Tuchman defined as the conviction that:
you do not have to think because you already know
Which is exactly what current AI does, it doesn’t think, it knows.

Current AI is nothing more than a statistical extrapolation of past observations. The data underlying GPT3 are 500 billion words dumped from the internet, a veritable sewer of texts. Trying to get clean text from that is as difficult as extracting clean water from sewage.

lurker September 25, 2022 12:19 PM

@Winter, “Intelligence” is an abused word.

There have been various experiments over the years attempting to nurture “intelligent” birds, and chimpanzees, in a human environment, with the intent of determining how much human intelligence, knowledge, or behaviour, these animals could absorb then reuse in a sensible predictable manner.

The starting point was sentient beings, not metallic silicon. One might hope the current AI tinkers had read the results of prior animal experiments. Using a word dump from the net is GIGO.

Clive Robinson September 25, 2022 7:19 PM

@ lurker, winter,

Re : Forcing Inteligence.

“There have been various experiments over the years attempting to nurture “intelligent” birds, and chimpanzees, in a human environment

The big problem with this as I’ve highlighted is,

“in a human environment”

There is a saying of,

“We are a product of our environment.”

Whist true it leaves a lot out. For instance I have survived quite harsh circumstances in many environments that would be “alien” to most in their late adolescent to mid twenties. There are a couple of reasons for this when I was very young (ie first 8years),

1, My environment was still post war non technical.
2, All the adults around me had lived through WWII and the harshness it created.

Thus I had to “learn to stand on my own two bare feet” and know how to do things even without the most basic of technology like matches or even cooking pots. Which os why it comes as a big surprise to many that you can actually heat water in a paper bag over an open fire IF and it’s a very important IF you know either,

1, The skill as a learned trade.
2, Work it out via knowledge of science.

With the first one you have a “pattern” or “artisanal secret” you can trade on. You can sort of scale it but not inovate with it. Inovation as such is by accident or by trial and error, and all you end up with is another “pattern”. It’s effectively “found knowledge” not “reasoned knowledge” and it does not realy take you forward.

With the second you can pull the same trick with any type of container including plastic bottles or leather bags, that could be even a leather boot at a pinch (I’ve demonstrated it with a triangular piece of sewn leather used as the seat of a three legged stool). The reason is you know that the temprature on the outside of the container if thin is primarily due to the temprature of the liquid inside, and if pourous due to the latent heat from evaporation. This knowledge enables you to work out where to place the container with respect to the heat comming off the fire and it’s reduction due to convection effects. Whilst you still might need to “experiment” this is guided by knowledge and reason and is much faster than dumb luck or trial and error.

Understanding about the difference between “artisanal” and “science” behaviours gives the practical real world difference between having a “trade” and being an “engineer”.

Whilst a “trade” took more than five centuries to give us the “coach wheel” enginering gave us efficient heat engines in less than half a century, likewise electrical communications over vast distances, oh and likewis got us to the moon in around a quater of a century. So ten to twenty times as fast to radically change our environment. Computers and other technologies are changing it at an even faster rate. It’s why “technology” is forcing an effective environment evolutionary change in some humans, by “continuous learning” from “reasoning”. Whilst the majority tend to hold to a moment in time by the mechanical skills they learn whilst still adolescent, then spend the rest of their lives honing those mechanical skills.

We want to find the former in these “animal tests” but mostly get the latter. Which brings us to another saying of,

“A fish out of water.”

If you take an average adult human out of an environment it is comfortable in, the chances are for most they won’t survive let alone thrive and will be dead in just a few days to a few months depending on the level of environmental change. They are creatures of the environment they never learned to live outside of whilst they were still adaptable enough to do it.

If we apply similar logic to animals their infant to end of adolescent time is a very small fraction of that of humans, often measured in a handfull of weaks not a couple of decades.

So I don’t expect the average animal to have the time to learn to reason, they are hard enough pushed just to get environmental mechanical skill sets done.

If I am right or wrong on this can be tested by “back to the wild” experiments trying to teach creatures effectively “fed by the hand of man” to learn wild environment skill sets. We know they have not been very successfull in “big cat” trials in the past.

JonKnowsNothing September 25, 2022 10:47 PM

@ Clive, @ lurker, winter, All

Re : FAILED: Forcing Human-Type Intelligence on Animals

It can be a very touchy subject, as there are extreme views on both ends of the spectrum, however there is a large middle section, with a surprising mix of science and farming. The “artisanal” and “science” mix when it comes to Animals using Human-Style Intelligence.

I am in the camp that it is not possible to teach Human Intelligence to animals because animals do not understand “Intention”. Animals are wired for direct input-output and they don’t have great capacities for in between. Experiments work best when they harness direct input-output (push a button – get a reward) whether its a lab rat or a chimpanzee, or Grand-Prix dressage horse.

Humans just do not understand animals much, and even after hundreds of years of trying we find that the vast majority of our historical views were Not Quite Right.

Depending on the species, whether “short-term to maturity” or “long-term to maturity”, also makes a big difference. Animals that are short-term to maturity have a great deal of hard-coded information for survival. It doesn’t mean that individual animals survive but they have a high level starting point of information. As Winter said: They KNOW. They don’t know that they know because to them it’s Just How Things Are.

Animals with “long-term to maturity” often get picked for these intelligence experiments and the vast majority flop, not only during the research phase but because the animal is thrown away by the researcher after the experiment is over: 1,2,5 years of a 20,30,40 year life span. During these sorts of experiments, often with apes or monkeys, the researchers do their darnedest to re-program the animal for human input. Even when minor success happens, the aftermath is not Happy Ever After.

Consider something barnyard simple, that’s not. Chickens, Ducks, Geese. They all have differing levels of Starting Points but a major one is: the parents do not feed them as do song birds. The chicks are born with a pecking instinct or behavior. They learn what to eat by using this on anything around them: dirt, gravel, grass, worms, frogs etc. They do not eat the same things. Ducks are omnivores and Geese are herbivores. Chickens are often thought to be herbivores but they are omnivores and the end comes quick for any lizard or mouse that crosses the path of a barn yard hen.

Even so, we can interact with them, call them to us, herd them to fields and lakes during daytime, and call them home to shelters at night. We successfully use what little we understand of their thinking, to fit in our own human spheres of wants, needs, desires.

When animals perform under their own intellect, they run right up against humans who do not tolerate any form of “superior attitude” from an animal. A by product of which happens when other humans get tagged with the moniker “animal” if their behavior transgresses social norms.

Humans rarely meet their match physically: bears, lions can make an appetizer of a urban human displaying elite human thinking while jogging in the animal’s habitat. Humans have stubborn views of sharing anything, with other humans and they do not like sharing their habitat with anything that challenges that view. Wolves pay the price for being in the wrong state when they have no concept of human territorial boundaries. We have only recently become aware of Bull Sharks now living quite well in the Great Lakes of the USA and other inland water systems. Canals, reservoirs and floods all channeled them up from the seashore. No one expects to get eaten, but when it happens, the whole Eco-system pays the price.

We understand animals so little, because we are an intolerant species, and we never tolerate an animal unless it is subservient to human interests.

This is what Forcing Human-Type Intelligence on Animals is really about: making animals more compliant to our human control.

Animals resist the only ways they can. They pay a dearly for resisting. Sometimes it’s the kindest option, because it ends the years of torture, punishment, abuses heaped on them because it was written in a book that this should be so.

SpaceLifeForm September 25, 2022 11:21 PM

@ JonKnowsNothing, Clive, lurker, Winter, All

Stable Confusion

It would be a more apt name for this.

Check out this AI generated pic. Scroll down just a bit.


Winter September 26, 2022 1:11 AM


It can be a very touchy subject, as there are extreme views on both ends of the spectrum, however there is a large middle section, with a surprising mix of science and farming.

The one thing that is sure is that humans are animals. Everything we can do has their counterparts in animals.

Animals are wired for direct input-output and they don’t have great capacities for in between.

There are millions of animal species. Few, if any, of them are hard wired. No vertebrate is hard-wired. I would suggest you read the books of Frans de Waal. The first one, Chimpanzee politics would be an eye opener already. His later research even tells more about us being the Third Chimpanzee (another nice book by Jared Diamond).

If a book is too long, watch this video about monkey fairness:

Experiments work best when they harness direct input-output (push a button – get a reward) whether its a lab rat or a chimpanzee, or Grand-Prix dressage horse.

It helps when you actually watch animals behave. That said, behaviorists [1] did not do this and showed us that asking the wrong question gives you a useless answer. Animals evolved to be successful in their environment (see Clive’s response). There are few environments where having no foresight is advantageous. It never ever is in social animals.

What I do agree on is that few animals will flourish in human environments. These few are called pets or farm animals, or more often, pests. Wild animals are well equipped to outwit humans in their own environment. Hunting some animals is very difficult.

Clive Robinson September 26, 2022 1:13 AM

@ SpaceLifeForm, JonKnowsNothing, lurker, Winter, All,

Re : Stable Confusion

About the only two things I would say they have correct are,

1, AI is an unfortunate term.
2, Nobody knows what the technology goals are long term.

So the only thing true is that the domain has two parts. Firstly “research” that is mostly “aimless” in the true meaning of the word (ie has no inherant/relavant goal). Secondly “Production” that is often ainless in of it’s self but reflects what “people will purchase”.

Thus the “market” for production is one that is in theory “needs driven”. But what are those needs?

Perhaps if we examined those we would come to better naming. Without going through it “Scapegoat” or “Excuse” systems might be the highest truthful answers, though “Agenda Following” might be better. That is curently people buy systems that follow “preconcieved notions” of what they want. When conected to codifing covert or overt latent prejudices these systems are extreamly dangerous as they provide,

1, “Computer says” excuses.
2, “Black Box” excuses.

Neither of which are factually correct but easily alow “arms length discrimination” by people with “Agendas”.

But how did we get hear?

Well historically,

In the early days it was as I’ve explained before “expert systems” or basic fixed “rule following” based on “facts” thus was an electronic version of a paper book of flow charts.

This was problematic because the “question / answer” flow was still “opinion” rather than “fact based”. For instance the question and answers might be entirely based on facts, but is the question the right one? That is the question might be related to colour perception, or, to “touch perception” both in of themselves fact based, but which is the correct question to ask at a given point in time. It effectively is still “opinion” rather than “fact” based though one step removed. Currently we are trying to make such systems appear to be “fact” based rather than “opinion” based but are only adding layers to hide the “opinion” thus providing easier excuses to hide “prejudice”.

The main difference today is that Machine Learning –also an inappropriate name– is still an “Expert System” but these days uses statistical meathods to find the “rules” or “questions answers” to build the expert system with.

In essence when you boil down these ML systems we have had so far, you end up with the same “rule following” behaviours, but with the number of rules beyond most human comprehension. More important is they have so many “dimensions” these too are beyond most comprehension. The result is that we can explain individual rules, like we can the nuts and bolts of a complex machine like a 747 aircraft. Yet our understanding of the nuts and bolts provides no answers as to what the machine does or how it behaves. That is we have to “look beyond” the individual rules, and even related collections of rules and try to see the system and it’s actual operation. When designing an aircraft this is comparitively easy because we know what it is we want to do so we “top down design” which is many orders of magnitude easier than just looking at the parts of the system and trying to “bottom up reason” the functionality (in actuallity it’s the same issue as with “one way functions” used in crypto algorithms).

When expressed this way it’s easy to see that an “artfully chosen” set of data will have the prejudice “built in” statistically, thus the resulting rules will follow the “prejudice”.

Less obvious but sadly true, as some have now practically demonstrated the “early data taints the rules”. Thus you can have a statistically neutral data set, but the order you present the data weights the rules in favour of the “prejudice” contrary to the over all data set…

But also the statistics all the systems use suffer from the “early late” issue at all stages. So using tainted-order data-sets still moves the out come in desired ways. In essence the statistical systems behave like “leaky integrators” so have the equivalent of atleast two time constants of charge and discharge. Knowing what these time constants are alows you to “pump the system” to achive a desired outcome.

This is effectively the same as in signal processing where we have the notion of “matched filters” where it’s not just “time” but “order” that can be selected for but also “amplitude” of each component. That is each dimention of the signal can be used to “pattern match” to produce a desired result. The real fun comes when you start to look at how “Adaptive Matched Filters” work. In essence they are “unstable” using “positive feed back” but are stabalised by using “negative feed back” in certain ways.

All the current ML systems can be seen in this way only the “weights” that control the stabalising feedback are effectively hidden behind complexity as complex as that used in cryptographic systems. So we as “observers” outside the box with the “Chinese Door” have to take ask many questions and analyse the answers to try to deduce the rules. Something that in effect takes to long to do…

But this undecidability issuse in ML systems is highly desirable by those that wish to hide prejudice and keep it at more than arms length…

There is what was a fiction story at the time it was written about a weapon that used “facial recognition” it was used to perform a political assasination. In the story they can not find the face pattern used by the assasin and it was believed that no photo or other image of the target existed. The story explains it away as it was “discovered” that the assasin knew the target would wear an item of cloathing that was “black and white checked”. So rather than match the face it matched the item of cloathing.

The problem is this is actually exactly what you can do with modetn ML systems. They have no definition as to what a face is or what an item of cloathing is just a “pattern” so a pattern for a black and white check is just as valid to an ML system is, as is one of a face. You as an observer have to find out what the signal is without knowing what type it is. So if you are looking for “facial features” in your interogation search by using modified face photographs you will probably fail.

Winter September 26, 2022 1:28 AM

PS, forgot note

That said, behaviorists [1] did not do this and showed us that asking the wrong question gives you a useless answer.

[1] The reasoning of Skinner, that you should not assume an inner, mental life in animals, suggests a person without a theory of mind, aka, a psychopath. The way he raised his son is evidence for me he was a psychopath.

Winter September 26, 2022 2:15 AM


The main difference today is that Machine Learning –also an inappropriate name– is still an “Expert System” but these days uses statistical meathods to find the “rules” or “questions answers” to build the expert system with.

Current deep neural network AI is a method to approximate any input/output function based only on historical input/output data. If the new input is outside the domain of the historical data, the output is basically random. That is, these systems fail catastrophically. This catastrophic fail mode is apparent in the adversarial input like the OP.

Old AI concepts of intelligence, expert, or rules do not apply. The resulting system is always a black box. This makes them useless in many applications. Eg, doctors are generally not allowed to use a black box diagnostic system.

They work best in things like image and speech recognition where “average” performance is relevant and fall-back methods can be used in case the system fails.

Note that “explainable AI” is the big hot research field where people try to extract the “reasoning” of the AI from the network.

JonKnowsNothing September 26, 2022 8:37 AM


re: The one thing that is sure is that humans are animals. Everything we can do has their counterparts in animals.

Agreed. Everything HUMANS do is based on our animal inheritance.

You will not find animals imitating humans, unless they are forced to do so.

Even with some capacity for compliance, a chimpanzee will much prefer to be with other chimpanzees and not locked into a Gilded Cage where they perform actions defined by humans to elicit human appearing activities, in exchange for a few grapes and a walkabout on a grass lawn surrounded by no-climb fencing. Chimpanzees are well aware of their incarceration, which is why the cages are built to prevent them from Leaving The Playing Field.

Many animals do have the capacity to form a relationships with humans. This relationship is not a Human Imitation Relationship, but based on their species format of interactions.

Animals are not brain-dead. They do not think like humans. They think exactly as they are.

rl, tl;dr

It is human-emotionally charged response, when you witness an animal attempting to escape from human captivity and unable to do so. There are many reasons we keep animals chained, locked up and segregated. Many animals are aware of which way freedom lies and will spend hours banging on gates, nibbling or biting at snaps and locks that prevent them from exiting. Some can become masters at opening snaps but a keyed or combo lock is beyond them. They still persist.

Sometimes the restraint is “for their own good” because they do not have the capacity to understand that Out There is life ending for them. Human Intentions maybe considered humane, but animals do not understand Intention.

Winter September 26, 2022 12:34 PM


Many animals are aware of which way freedom lies and will spend hours banging on gates, nibbling or biting at snaps and locks that prevent them from exiting.

That is true. But it is also possible to build “cages” that make animals feel at home. Not for all animals, but for many. And zoos do it because for many animals, there is no place to live anymore.

The same Frans de Waal has build chimpansee colonies that were “home” enough for the animals to move and behave like they were feeling home (including the occasional homicide as seen in the wild). But these animals were kept at a distance of humans.

The monkeys from the movie clip were in such colonies and they participated out of their own initiative. They really want to do such things (as they do in the wild).

Leave a comment


Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via

Sidebar photo of Bruce Schneier by Joe MacInnis.