Fooling NLP Systems Through Word Swapping

MIT researchers have built a system that fools natural-language processing systems by swapping words with synonyms:

The software, developed by a team at MIT, looks for the words in a sentence that are most important to an NLP classifier and replaces them with a synonym that a human would find natural. For example, changing the sentence “The characters, cast in impossibly contrived situations, are totally estranged from reality” to “The characters, cast in impossibly engineered circumstances, are fully estranged from reality” makes no real difference to how we read it. But the tweaks made an AI interpret the sentences completely differently.

The results of this adversarial machine learning attack are impressive:

For example, Google’s powerful BERT neural net was worse by a factor of five to seven at identifying whether reviews on Yelp were positive or negative.

The paper:

Abstract: Machine learning algorithms are often vulnerable to adversarial examples that have imperceptible alterations from the original counterparts but can fool the state-of-the-art models. It is helpful to evaluate or even improve the robustness of these models by exposing the maliciously crafted adversarial examples. In this paper, we present TextFooler, a simple but strong baseline to generate natural adversarial text. By applying it to two fundamental natural language tasks, text classification and textual entailment, we successfully attacked three target models, including the powerful pre-trained BERT, and the widely used convolutional and recurrent neural networks. We demonstrate the advantages of this framework in three ways: (1) effective—it outperforms state-of-the-art attacks in terms of success rate and perturbation rate, (2) utility-preserving—it preserves semantic content and grammaticality, and remains correctly classified by humans, and (3) efficient—it generates adversarial text with computational complexity linear to the text length.

EDITED TO ADD: This post has been translated into Spanish.

Posted on April 28, 2020 at 10:38 AM49 Comments


c1ue April 28, 2020 11:03 AM

It seems abundantly clear that any ML algorithm, and likely any neural net algo, can be corrupted if there is a way to test outcomes for submitted results.

My question is: is there any circumstance where the feedback loop necessary for an algo attack, not existent?

If so, then this statement becomes important:

The only protection is “security by obscurity” – which I think we all agree isn’t secure.

And if this chain is valid, this seems like a really significant structural problem with using ML/NN algos because it is basically guaranteed that weaknesses will be found.

Although the outcomes are perhaps not different than penetration or application testing, it still seems to me that there are operational and structural differences between an ML/NN driven feedback loop enabled algo attack vs. standard security testing (or attacks).

Clive Robinson April 28, 2020 11:04 AM

@ ALL,

We should not realy be surprised at this.

But I suspect the systems would also trip over both sarcasm and euphemisms,

After all what do you think,

    They’re a real winner.


It rather depends on where you come from. So what about,

    Time to wash up.

In the UK “washing up” means to “do the dishes”…

Clive Robinson April 28, 2020 11:08 AM

@ ALL,

I guess I should also mention that,

    swapping words with synonyms

Has been used since the 1980’s at least to hide a “serial number” or “watermark” in secret and above documents so that if they are leaked they can be traced back to a source…

Clive Robinson April 28, 2020 11:14 AM

@ c1ue,

My question is: is there any circumstance where the feedback loop necessary for an algo attack, not existent?

Yes my example of it being used as a “secrecy system” to give a document a serial number, if you know which words to change then you change the serial number…

But if you think about the problem at a different level, I think you will find that it becomes another variation on the “Halting Problem” for ML / AI.

Bob Paddock April 28, 2020 11:56 AM

Any NLP’s handle double entendres yet?:

If I said you had a beautiful body – Would you hold it against me…

Peter S. Shenkin April 28, 2020 1:11 PM

Well, it seems to me that ML processors should incorporate red teams. It’s unclear whether the problem is the underlying ML technology or its training. The response of the blue team should then be to enlarge the training set by incorporating the red team’s false negatives as inputs which should in fact score as true positives.

Having said that, “”The characters, cast in impossibly engineered circumstances” would give most native readers pause. “Contrived” ≠ “engineered”. I doubt that locution would arise naturally. But that doesn’t mean it should be left out of a red-team’s list of observed false negatives.

Chelloveck April 28, 2020 2:38 PM

@Bob Padock: Checking for double entendres is simple: “return(true)”. In the infamous words of Tom Lehrer, “When correctly viewed, EVERYTHING is lewd!”

max630 April 28, 2020 3:19 PM

What is really worrying is that this research is reported as related to a security. So that somebody may consider that relying on “machine learning” in a security is a remotely sane idea.

lurker April 28, 2020 4:09 PM

NLP is an oxymoron. How can the language be natural when the processor has no physical experience of the nouns, no sensory experience of the adjectives, and no emotional experience of the verbs? It is an artificial construct like all programming languages, and the researchers have simply exploited the available compiler bugs.

Anon April 28, 2020 4:47 PM

NLP is an oxymoron. How can the language be natural when the processor has no physical experience of the nouns, no sensory experience of the adjectives, and no emotional experience of the verbs? It is an artificial construct like all programming languages, and the researchers have simply exploited the available compiler bugs.

“Natural” refers to the languages being processed, not to the construct doing the processing.

lurker April 28, 2020 8:23 PM


“Natural” refers to the languages being processed…

Indeed it does. I’ll rephrase my question as a statement:
Since the processor is unnatural, and does not live in or experience our natural world in the way that has allowed us to evolve Natural Languages, such a processor may never become fluent in those languages.

Note carefully the final clause of that statement; what does “may never become” mean? I am not a lawyer, and never played one on TV, so I don’t write to be interpreted by Judges of the Court. A brilliant example of natural language was posted above by Bob Paddock

Drone April 28, 2020 11:45 PM

@Bob Paddock said: “Any NLP’s handle double entendres yet?”


Human says, “If I said you had a beautiful body – Would you hold it against me?”

NLP’s programmed response, “I depends dear, are you complimenting me or do you want a cuddle?”

Phaete April 29, 2020 1:52 AM

It also matters what you think are synonyms.
I have issue with their following example of synonyms

their computer animated faces are very expressive
their computer animated face are very affective
their computer animated faces are very diction

And it looks like spotting is easy, just check for words that are 1000 times less used singular or in combination.

like a south of the border melrose place
like a south of the border melrose spot
like a south of the border melrose mise

Really french words?
I bet less then 1% of the english speaking world uses that word.
And if used in english, it is just because they seen a food show and learned what mise en place is (or are actually a chef).

Merriam Webster has a total different meaning for mise:
the issue in a legal proceeding upon a writ of right also : the writ itself

So they set out to fool NLP, they sure picked the right (wrong) words for it.

Clive Robinson April 29, 2020 2:33 AM

@ Phaete,

Really french words?

What’s wrong with French words, English is full of them, though they might not mean what people think they do (gauche and adroit being but two of many)

But sometimes words just appear with different meanings but without a logical path “sale” being one (comes into English from Old Norse, meaning to “grasp”). Other words with different meanings might have a nebulous connection “on” being one.

However I have often wondered what a French person walking down say London’s Oxford Street would make of the large banners and similar that say “ON SALE”…

Phaete April 29, 2020 3:14 AM


Well yes, French words, not adopted (yet) in English.
My own Germanic based language is also full of it.
But mise is not adopted as far as i can find.

If you use not adopted words it gets very easy to fool human and NLP alike.
for instance: The chat is not marching. A frenchmen with english understanding would send me to the pet doctor, an american would call IT.

And i know how that frenchmen feels, the (by youngsters) commonly used abbreviation LUL means a part of the male anatomy in my language.
It tickles a few hundred times perhaps, then it becomes mundane.

tfb April 29, 2020 4:13 AM

It is famously the case that humans can be derailed by suitable choice of language. These systems are a lot simpler than humans, and also are using fundamentally worse learning techniques – a human doesn’t learn a natural language by trawling through a vast corpus – so, unsurprisingly, they can be derailed more easily. News at 11.

old*man April 29, 2020 5:51 AM

NLP’s programmed response, “I depends dear, are you complimenting me or do you want a cuddle?”


Bob Paddock April 29, 2020 7:01 AM


“NLP’s programmed response, “I[t?] depends dear, are you complimenting me or do you want a cuddle?” ”

So it has a canned answer, as in ‘programmed response’, to some common things that might be given to it to confound it, rather than analytically deducing a response?

Clive Robinson April 29, 2020 7:26 AM

@ Phaet,

The chat is not marching

Aproximates to “The ‘masculine cat’ is not ‘feminine step(ing)'”

Which might be sufficient reason to take it to a vet :-S

“Feminine steping” is kind of what a neghbours long haired black cat does, it darts in at speed then “sashays” in a very feminine petite elegant and French way through the house gently flicking the tail into “kiss curls” in a mesmerizing way whilst purring affectionately[1].

But as for the words with other meanings in UK English “rubber” does not have any euphemistic conotations unlike in US English. Likewise rooting unlike in Australian English.

But the one that always throws me every time I hear it, is the German “mit handy” it just sounds so wrong no matter which German says it.

[1] Which is why we named the cat “Madam le furball” befor we knew who it belonged to. Apparently it’s a “social” or “chow hound” cat and kind of ingratiates it’s self where ever it can sniffing out food. However it officially belongs to a neighbour and is named “Duddly” after a Harry Potter character… When we told them why we named the cat “Madame Le Furball” they laughed and said that yes they thought it was a “Molly cat” when it moved in on them as a stray, but the vet corrected them. It’s since turned out to be highly territorial and had half it’s face opened up in a fight with a cat atleast twice the size, similarly a fairly heft wound on the abdomen from another cat. But importantly gets on well with people especially if they are eating “lamb kebab”… But the name has stuck and you will hear some one say in warning “look out le furball is in the house” or “MLF is in”.

Phaete April 29, 2020 10:30 AM


There you go unintenially? proving my point.
Marche is similar to marching in English, but the main meaning in french is work (functioning, not job related)

Il ne marche pas means it does not work.

So any french with minimal english would understand it as the cat is not functioning (bad grammar use though, malade (sick) works beter, but just as in english, the meaning is obvious)

Although in English we don’t say the car is not marching, we can say the car is not running, so the function of marching/running as in functioning is preserved between both languages. In Dutch it relates to walking (lopen) for the same.

And don’t get me started on cat stories, i bet i can fill half the remaining quarantine with it.

My favourite is about a black cat i had, raised from 6 weeks old.
When i was making its dinner on the kitchen counter, he used to climb up via my legs (pantless no objection for him) to see what he was smelling and when on the second precise his dinner would be served.
After a few painful days i tried putting him on top of the cupboard above my counter so he could see, but it was too high to jump on the counter from. This worked, he looked silently down as i prepared his meals.
When he got a bit bigger, he could jump from my shoulders to the top of the cupboard and back again, so i just put him on my shoulder at mealtime, walked to the cupboard, he jumped up, i made the meal, he jumped back on my shoulders and eat the meal.
I was a bit lazy, so instead of picking him up, i went to the open stairs in the room, the cat would walk halfway and jump on my shoulders when it was mealtime, so i could walk to the kitchen and he would go on the cupboard again.

Then it became a game, he would lurk halfway on the stairs, waiting for me to pass, jump on my shoulders and expected to be brought to the cupboard to jump on for meal proceedings, no matter what time it was.
After my GF at the time had played it a few times with him, he started playing it with everyone that entered the house
The open stairs where right next to the door. Lots of people got jumped on by my cat, most were terrified when it happened to them the first time, frozen stiff with a 6 kilo cat on their shoulders softly purring, and we couldn’t stop laughing every single time.
A few months later he kind of knew who who fed him, or give him attention and would stop jumping on strangers.

parabarbarian April 29, 2020 12:52 PM

Heck, the shitposters on Facebook have been doing this for a while. One particularly masterful example from a few months back is this phrase which caused a lot of people to be warned and a few temporarily banned.

Tendentious chiggers quickly find the chinks in armor of niggardly quality.

That may explain why Facebook no longer tells the user which particular posting triggered the warning or ban.

lurker April 29, 2020 1:52 PM

Bruno Araujo

Attack, counter attack – so now the (relatively early days) NLP system will need to be trained on synonyms…

Trained? But there are plausible suggestions above that these systems are “programmed” or preloaded with canned responses. Their analysis of the input is only mechanical, lacking life experience. Unnaturally processing natural language…

Clive Robinson April 29, 2020 2:49 PM

@ Phaet,

with a 6 kilo cat on their shoulders

That is a big moggie (~13lb in old measurments).

Years ago I mentioned that I once had a baby ‘big cat’ do that to me[1], and you can still just see the claw mark scars where it misjudged how tall I was and rather than drop down it just clawed it’s way up through my cloths… Some of which were “best casual”…

[1] It happened at the home of an “exotic dealer” who re-homed excess animals from captivity (some are more than happy to breed in captivity which gives the problem of what to do with excess stock as they can not realy go back into the wild because of their formed relationship / dependancy with humans).

Clay_T April 29, 2020 3:11 PM

And i know how that frenchmen feels, the (by youngsters) commonly used abbreviation LUL means a part of the male anatomy in my language.
It tickles a few hundred times perhaps, then it becomes mundane.

Back at the turn of the century, Digital Audio Players (DAP) started to appear on the market.
The Creative Nomad Jukebox (NJB) was one many started tinkering with. It was large, clunky and awkward to carry around.
Many of us over here in the colonies used a fanny pack to carry our NJB.
Many LULz at the use of the word fanny, from our English friends.

MarkH April 29, 2020 4:11 PM

Patient readers of my comments will recall past rants on the essentially fraudulent nature of “artificial intelligence.”

To borrow from the late Douglas Adams, “machine learning” is almost, but not quite, entirely unlike learning. It’s a classic con-man technique to call whatever you’re peddling by false names which aggrandize it in the eyes of the poor sod you’re trying to swindle.

Computers have been programmed to play the games of chess and go, beyond human proficiency. Such games are pure abstractions defined in closed domains. Almost everything else in life is messy, ambiguous, endlessly entangled with uncountable numbers of other things …

The hallmark of “AI” systems is brittleness: they appear to function up to some boundary, beyond which they fail severely. If that isn’t bad enough, the location of that boundary is usually unknown.

I once read a book about multiple choice tests administered to millions of U.S. school kids, which fraudulently claimed to measure their excellence as students. That these tests were garbage, was proved by companies which succeeded in improving kids’ tests scores by large margins with a few hours of training in test-taking strategies.

Nobody believes, or claims, that the student’s “scholastic aptitude” is enhanced by such coaching.

The most interesting part of the book, was the author’s case — which I found convincing — that the fraudsters selling these tests can’t redesign them to eliminate this “coaching effect” … the ability to easily skew the results is inherent in the process by which the tests are designed and created. To grotesquely mingle metaphors, the weakness is baked in to their DNA 🙂

It’s going to be the same with these so-called NLP systems. I predict that there will be no countermeasure, which can’t be defeated by some relatively simple trickery.

Computers (as yet) understand nothing. You can program pattern-matching for the rest of your life: until the machine can understand the meaning of human sentences, it will make brainless mistakes.

@Phaete: Love the cat story!!!

I have one, which is actually set in Paris of all places. A man traveling to Asia for a few months had arranged with friends to take care of his cat. The cat-tenders for the first half of his travel had their own travel plans, so he had arranged for another friend to take over until his return to Paris.

Well into his time in Asia, by some communication with Paris, he learned that somehow the guy who was supposed to do Part 2 of taking care of his cat had never showed up. He knew that by this time, his cat had been alone in his Paris apartment for weeks, without food or even water. Horrified, he cut his trip short and got the soonest available return flight, expecting to find the corpse of his pet when he returned home.

He had left the bathroom window open a few centimeters: not wide enough for his cat to pass through, but enough to keep some fresh air flowing.

When he arrived to his apartment, he was amazed, and relieved, to find his cat sleek and healthy. He also found many pigeon feathers on the floor beneath the bathroom window 🙂

Clive Robinson April 29, 2020 4:19 PM

@ MarkH, Phaete,

He had left the bathroom window open a few centimeters

And probably the toilet seat up as well…

MarkH April 29, 2020 4:27 PM


An old philosophy department joke, founded on nuances of language interpretation:

Version 1

In a lecture on the relationship between language and logic, the professor observed, “as we have all been taught, when a ‘double negative’ is used in speech, the two negatives, if taken literally, cancel each other out and imply the affirmative. For example, ‘the hospital won’t allow no more visitors’ literally means that more visitors are allowed, or even mandated.”

“Now, it’s significant that this change of logical sense applies only to repetition of words of negation, whereas affirmation, whether used once or repeated any number of times, retains its positive meaning.”

From the back of the auditorium, a student was heard to comment, “Yeah, yeah, yeah.”

Version 2

The same setup, but this time the student says, “Yeah … right.”

PS Based on my limited experience of Parisian flats, I shouldn’t be surprised if there was at least one dripping tap 😉

Clive Robinson April 29, 2020 5:02 PM

@ Clay_T, Phaete,

Many LULz at the use of the word fanny, from our English friends.

You might want to look up the women’s volunteer First Aid Nursing Yeomanry or FANY they have an interesting history,

If you look at what they were asked to do in WWII with SOE and 1 Signals Squadron and later 2 Battalion you will realise they are something very special indeed.

It was my privilege to serve alongside them and have their support on various occasions during the 1980’s and early 90’s, as are 21 SAS and other of the more interesting “special forces” and “Special Communications” military units now collectively “UK Special Forces” (UKSF) where “the awkward squad” still have a home.

The FANY are still very much active today and are currently involved with supporting not just the “Nightingale Hospitals” with communications and logictics, but also they are supplying specialist support to the most senior levels of Government.

MarkH April 29, 2020 5:02 PM

I remember a long-ago Scientific American article explaining the enormous difficulties in machine interpretation of human languages. It used as an example the proverb, “Time flies like an arrow”, and showed three grammatically valid ways to parse it:

  1. Subject = “Time”
    Verb = “flies”
    “like an arrow” is adverbial phrase: as an arrow would fly
  2. Subject = “Time flies” (a kind of insect or baseball batting outcome, “time” as adjective)
    Verb = “like”
    Object = “an arrow”
  3. Subject = implicit “you” of imperative mood
    Verb = “Time” (as in, using as stopwatch)
    Object = “flies”
    “like an arrow” is adverbial phrase: as an arrow would measure time

Now, we know that to apply “time” as an adjective to “flies” is quite strange, and wouldn’t make sense without a lot of explanatory context. Similarly, it’s unexpected that any kind of flies would be attracted to any arrow: but depending on context, it’s not impossible!

And the notion of an arrow as an exemplar of time keeping is also strange, and seems improbable … again, unless there were some explanatory context.

The point is, you have to know a lot about meanings, patterns of speech, and context in order to correctly parse this very simple sentence.

If you don’t have a vast store of linguistic knowledge, and understanding of meanings and context, then all three of the above interpretations are logical and justifiable.

Is it possible in principle, that computers could be programmed to do this? I suppose so, but I suggest that noone on Earth knows a strategy by which this might be accomplished.

Wael April 29, 2020 5:12 PM

“Time flies like an arrow”


Fruit flies like a banana

Computer: Error, error, can’t comprehend …

Clive Robinson April 29, 2020 5:31 PM

@ MarkH,

Is it possible in principle, that computers could be programmed to do this? I suppose so, but I suggest that noone on Earth knows a strategy by which this might be accomplished.

The two answers are “Yes” and “That does not matter”.

As I’ve mentioned this can be seen as the analogue of the “Halting Problem” that underlied the Alan Turing paper of what is now known as the Church-Turing thesis.

It mattered not a jot to either Alan Turing or Alonzo Church that nobody new “a strategy by which this might be accomplished”. It was sufficient to use it as a “tool of reasoning”.

Have a lay back with your eyes closed and see if you can build something in your mind by which you can investigate the idea. Start of with “broad brush stroke” “black box” functionality. As needed break these large blocks up into smaller more functional focused blocks, keeping things as simple and generic as possible.

For instance one black box would do “functional decomposition of a sentence” this would logically have one input and an unknown number of outputs. These outputs could then be run through some kind of likelyhood estimator to be given not just a ranking but a context indicator. Which immediately tells you, you would need another black box that takes in the context indicators of all the sentences in a paragraph or section to deduce what the subject actually is then feed this into a selection proces to find the most likely meaning.

However how would you deal with an infinite sentance of a single word, such as “buffalo buffalo” or “buffalo buffalo buffalo” which are both valid statments and continue to be valid as you add more buffalo.

Wael April 29, 2020 5:32 PM

Hiring manager: Joe put you as a reference. Tell me about him
Reference: You’d be lucky if he works for you!

Forget ML! Some humans fail to understand, too.

Clive Robinson April 29, 2020 5:56 PM

@ MarkH,

With regards the negating joke, this is more than somewhat related,

@ Wael,

You’d be lucky if he works for you!

Which brings you back to my comment above about,

    They’re a real winner

What we are looking for is a sentence that no matter how you “dice it or slice it” you can prove an AI device can not interpret correctly if it does not have a “hard forcing rule” whilst a human can interpret it correctly.

If you find that, then it’s actually game over for “Hard AI” and likewise for “Soft AI” as that is “rule based” any way and “rules are not understanding. See “Chinese Room” argument,

MarkH April 29, 2020 6:11 PM

Taken from a book titled L.I.A.R.:

We would all like to see a picture of her hanging in the office.

Workers like him are hard to find.

Tye work he did while staggering was well below his capacity. (about a drunk)

lurker April 29, 2020 9:35 PM


However how would you deal with an infinite sentance of a single word…

Is the system input written (mechanical typed to satisfy the lawyers), or spoken – for simplicity ignoring regional accents? Chinese might be described as a natural language. There is a well known poem “Shi shi shi” [Ten stone lions] which consists only of the syllable “shi” ninety two times. Yes, there are the five tones, but a listener then hears only five distinct spoken phonemes, repeatedly jumbled confounding the context sensitivity of the language; the poem uses thirty four unique Hanzi characters. Even if the NLP is thoroughly trained in the Chinese use of digrams and bi-digrams, the results could be amusing. There is at least one video on YT of someone attempting to recite this poem.

Wael April 29, 2020 11:46 PM

@Clive Robinson,

What we are looking for is a sentence that no matter how you “dice it or slice it” you can prove an AI device can not interpret correctly if it does not have a “hard forcing rule” whilst a human can interpret it correctly.

So why would a human be able to interpret it correctly? There’re tons of examples where humans come up with differing and sometimes conflicting interpretations of laws, scriptures, or even simple rules (ignoring agendas and deceptive motivations). Obviously there’s more to the meaning of a sentence than the words that comprise it, right? AI doesn’t take that into account, which leads to the fact that humans also need “hard forcing rules”.

Clive Robinson April 30, 2020 4:02 AM

@ MarkH,

Taken from a book titled L.I.A.R.

Two of those sentances “picture, staggering” are based on a different interpretation of the pivot word.

In the case of “staggering” it’s also the difference between who the pivot word applies.

That is in the case of the observed (drunk) it’s his physical actions as seen by the observer.

In the case of the observer it’s being impressed/astounded of the (skillful) behaviour of the observed.

The meaning of the second part of the sentence is effectively irrelevant to the first part before the pivot word and can be freely exchanged with others without effecting the action of the pivot word.

The reason the pivot word works is because there is no frame of refrence or context given in the first half of the sentance.

And whilst I would not claim “context is all” it is important when you consider Searl’s “Chinese Room Argument”.

In the case of both the “picture” and “staggering” pivot words either interpretation is equally valid untill “context” is established.

Thus the observation can not be resolved “without context”. A clear indicator that a “sentence” is “insufficient” as an input block size. Thus the “halting problem” analogy. That is what size of input is sufficient to remove ambiguity? Which is why I possed the “buffalo buffalo” sentance where the meaning keeps changing as more of the input block size is revealed. Further proof of the input block size abiguity can be infinite.

@ Lurker,

Is the system input written (mechanical typed to satisfy the lawyers), or spoken – for simplicity ignoring regional accents? Chinese might be described as a natural language.

This is another asspect of the failing of Searl’s “Chinese Room Argument” to do with context ambiguity.

If you like it’s the “transducer” problem. Normally we consider a transducer a black box device to convert one input type to another.

Thus a moving coil microphone takes sound and converts it to electricity, and a moving coil speaker does the opposit. Thus you would expect what goes in in terms of inteligence to come out again. The same with other transducers, like a transformer that takes input current converts it to a magnetic field in the first coils and back to current in the second coil. Likewise generators and motors.

However there is something that most people miss that is implicit that most fail to realise and that is that with back to back transducers the intermediate “type” is actually “storage”.

You can see this with the old “mercury delay line storage” from the early days of computing. It works by taking electrical energy converting it to mechanical waves that propagate down a column of mercury to be converted back to electrical energy. You can store quite a number of “bits of information” in series depending on the length of the tube of mercury.

You can take this further, with a magnetic tape recording, where you can store the bits of information for what is effectively an indefinate time.

Both the mercury delay line and magnetic tape are in effect “analogue storage” importantly from our perspective capable of storing a “continuous signal” so capable “in theory” of storing an infinite amount of information resolution.

When Alan Turing designed his thought experiment we now call a “Turing Engine” he had a problem he wanted to deal with discreate “bits” of information represented by “symbols” whilst being able to ignore the issues this incurs. One of which is the problem with all “symbol sets” is that of the number of members in the set or “alphabet size”. This obviously limits the resolution of information to that of the number of members in the alphabet set. As we know with binary the alphabet has two members {0,1}. However there is a subtle flaw which is the empty set {ø} which has real world issues via the likes of “the unasigned value” or the “null value” or “has no meaning value” amongst others. Which means that you have to use “out of band” or “in band” signaling, which in the case of storage as opposed to signalling always becomes “in-band” signalling, which means the alphabet always has to have more members than required for information, which we can see with “C Strings” and “the null terminator” “\0” which is known to cause “ambiguity issues”.

Alan Turing had the same signalling issues, but he needed them to solve the “infinite resolution” issue and data boundries.

Thus rather than “analogue storage” Turing used “Paper Tape” storage which was actually a known technology in the mid nineteenth century (~1850). Put simply the “telegraph codes of which Morse Code was a later variant could be stored on a moving paper tape by using a transducer that lifted and dropped a pen on the moving tape in response to “line current”. In the early twentieth century with the likes of more complex codes such as Baudot and ITU-2 the idea of printing the English Alphabet on tape was how “telexes” and “ticker tape” were delivered in human readable form.

However ticker tape was not suitable for temporary storage for “resend” so the idea of “punched paper tape” was thought up and quickly became well established.

Thus with a “tape punch” and “tape reader” you could store “Plain English Text” on tape in theory not just indefinately but of infinite amount and with a little thought “searchable” by “message number” and “character offset” thus Random Read Access (but Random Write access at only the message level not character level).

The problem that came up with the “32 Character telex codes” such as Baudot and ITU-2 was that whilst fine for a subset of Latin Character based alphabets used in European Languages they were not good for all European languages nor for languages that did not use Latin style Alphabets but Pictograms as many Asian languages do.

In the case of Latin Alphabets the solution was a bad one which was have “different print heads” that is what got printed for a given 5bit code was “print head defined”. Thus if you sent a message using an English print head it would be correct on another English print head Telex machine it would be wrong for a French, Spanish etc print head machine. Thus either the message would have to be prefaced by instructions to change the print head or the operator would have to “pattern recognize” what was happening and retype the message correctly.

The case with Pictograms was not realy any better either. Put simply a subset of “common pictograms” was selected and these were asigned “serial numbers” which became two or more Latin Alphabet charecters forming multi dimensional mappings. So two letters gives 26×26 or 676 pictograms, which is human memorable with practice. But with four letters giving 26x26x26x26 or 456976 or just under a half million pictograms is realistically beyond most humans without some kind of “short hand” rules (which had already been worked out with “oriental dictionaries”). Thus just before WWI the first Oriental Typewriters were invented,

But neither the oriental codes or typewriters could cover the full number of pictogrames in either Japanese or Chinese logographic alphabets.

So both the multiple ussage of a pictograme –or glyph– and a limited subset issues arise and thus “information is lost” when the use of transducers to storage is used. Not that is usually a problem because “context” usually but not always as a result of not individual sentences/phrases but larger blocks of text make up for.

But as the “Shi shi shi” poem shows similarly to the “Buffalo” issue there are always exceptions.

Which further reduces the “Chinese Room Argument”.

@ Wael,

Obviously there’s more to the meaning of a sentence than the words that comprise it, right? AI doesn’t take that into account, which leads to the fact that humans also need “hard forcing rules”.

The answers are “Yes”, “Yes currently”, and “No”.

A sentence can be “stand alone” in which case it needs to be entirely unambiguous at many levels. That is it contains all “context” and other higher level meanings. In practice even with questions and statements few sentences are ever entirely free of ambiguity, which is why we have larger blocks of text made up of sentences such as paragraphs, sections, chapters, books, and libraries.

There is a “truism” that a five page scientific paper contains as much actual information as a book, and that the book, in turn contains as much information as a library. That is to understand the paper you must know what is in the book, and to understand what is in the book you must know what is in effect a specialist refrence and teaching library. The problem is you still need more than the library can give you…

This is because importantly unlike a “Database” which the library is you must be able to “understand” the information within and be able to form new patterns and conclusions from it. To do this you have to be able to not only understand concepts that our five senses give us but something more we glibly lable “inteligence” but have no idea how to show it even exists let alone quantify it in a way that alows measurands to be agreed and thus information acurately transfered.

Whilst humans have physical limits, as far as we are aware they do not come with “hard forcing rules”. That is we can not just recognize our limitations we can understand them and thus learn to adapt around them.

@ Lurker, MarkH, Wael,

So we come onto the problem with Searl’s “Chinese Room Argument”.

There are three places where “understanding” or “inteligence” is required and none of them are in the room at any time.

The room is in reality is just like a tape reader, indexable tape store, and tape writer.

The understanding is required to formulate a question and understand it, nothing more.

The first occasion understanding is required is in building the “tape store” database not in using it. This is done before the room exists, thus never goes inside it.

The second occasion understanding is required is in formulating a question to ask/use the room. Thus this understanding is again outside the room in the head of the person writing the question.

The third time again is outside the room in the head of the person reading the answer from the room.

To try to make an argument that the databse is somehow some kind of inteligence is an absurdity. It’s like saying that you imbue a sheat of paper with intelligence when you doodle on it or a toodler makes random circling and lines as they scrible with a pencile, or a snail leaves a trail on it. To make that make sense you would have to argue that “sampled chaos” put on a magnetic tape or platter of a hard drive is actually inteligence, but where is the ibteligence, which oarticle of rust has it? Such an argument falls to the “Reductio ad absurdum” or “turtles all the way down” falacy where you would just go on forever trying to argue for what does not exist by the false premise you are using. The false premise is saying “Knowledge” is “Intelligence” it’s not nore ever was.

Knowledge is stored or communicated information. That is information that has no physical form is impressed or modulated onto physical matter or energy which are at the end of the day the same thing, or if you prefere interchangable forms of the same thing.

Where inteligence exists is in the processing of knowledge to make new knowledge by which we obtain a better understanding of our physical universe around us.

Whilst simple to say, so are “random” and “chaos” but how do you not just define them but differentiate them?

At the moment we have the same trouble with “Intelligence”, any definition we come up with is either nebulous or insufficient, therefore we don’t have a way to quantify it, measure it, communicate about it, or realy understand it.

Which is why both Hard AI and Easy/Soft AI are a “shell game” as beloved by confidence tricksters and those deluding themselves they somehow have the ability to understand under which shell the pea is (those slightly wiser know that the pea is effectively under none of them 😉

The fun realy starts however when you consider Soft/Easy AI, it was once called “expert systems” and they were at their simplest “rule following” systems little more than glorified Ladder Logic or Simple Non-feedback State Machines. When we replaced the hard logic with Bayesian Estimation we got the basis of Fuzzy Logic, it alowed for less hard rules and faster response. That is simple statistics had “softened” the edges of the hard rules thus needing less rules with less hard limits. In effect we have moved on rather than humans doing the statistical analysis of data and comming up with Bayesian rules, we have alowed the software via what are sometimes called “genetic or annealing” processes to find it’s own rules, and so on. At no point does “Intelligence” magically arise there is no “understanding” involved just an analysis of data and desired outcomes making what is yet another statistical series of rules.

Whilst this is a very powerful process no understanding or knowledge generation is required. There is no more Inteligence in there than there is in a Digital Signal Processing system. And like all rule based systems it is by definition either bounded or incompleate, thus “finding edge or corner cases” is realy about not sanitising data into a fully defined context.

Wael April 30, 2020 5:07 AM

@Clive Robinson, MarkH,

The skull is unusually dull at this time, and the response will be way too long to compose now — perhaps will digress off-topic a bit, too. The thing is: there’s language dependancies as well (I’ll give examples later — if I feel like it). Some languages are very capable of precise expression, and AI may have better outputs with them.

we got the basis of Fuzzy Logic

Lotfi Zadah again!

Knowledge is stored or communicated information.

I see! Claude Shannon 😉

And like all rule based systems it is by definition either bounded or incompleate

My goodness! It’s been a while since I looked at Kurt Gödel.

Knowledge is stored or communicated information.

Discovered, processed, and updated, too!

So much information here to cover… I’m not sleepy yet, but I need to force myself because I need to wake up early!

Wael April 30, 2020 5:39 AM

@Clive Robinson,

A sentence can be “stand alone” in which case it needs to be entirely unambiguous at many levels.

I gave this example in the past (too tired to give a link)

I drink my tea boiling hot.

Who’s hot, I or the tea?

I drink my tea standing up

Who’s standing up, I or the tea?

What does AI need to attribute the adjectives’ affinity? It needs to understand that tea has no legs to stand on. In the first case, I also could be (unlike your Klingon looks) hot 😉

Phaete April 30, 2020 9:16 AM

Just take an inventory of “Sauce”
part of them denotes ingredients; tomato sauce, garlic sauce etc.
another set uses their intended usage; pasta sauce, meat sauce etc.

We humans can only learn them through definitions and usage.
i have had several conversations going like this:
Me: “What kind of sauce you put on that hamburger/meat sandwich?
They: “Hamburger/meat sandwich sauce”
Me: “yes, i know you put hamburger sauce on my hamburger, but what is the sauce made of?”

Bill April 30, 2020 10:04 AM

While the public perception of ‘artificial intelligence’ may be off, it’s not so easy to dismiss the abilities of machine learning techniques, as Chomsky long-ago showed with his “Colorless green ideas sleep furiously.”

Clay_T April 30, 2020 1:46 PM

@Clive Robinson

You might want to look up the women’s volunteer First Aid Nursing Yeomanry or FANY they have an interesting history,

Thank you for the link.
Interesting indeed!

Erdem Memisyazici April 30, 2020 5:54 PM

That’s because ML has no concept of context. An AI only knows when a pattern looks like another, but not the context in which it exists. A good example of this is putting a speed limit sticker on a stop sign, or projecting one on a wall with a drone for self-driving AI. Even simpler systems begin with this flaw. If you put two faces in a facial recognition system as Bob and Joe, the entire human race are either Bob or Joe with 0-100% certainty.

Erdem Memisyazici April 30, 2020 6:00 PM


Precisely. If you spoke pig latin you’ll get the same results if it wasn’t configured into the system. A majority of people use a common set of words, which is why it works most of the time.

David Leppik April 30, 2020 9:43 PM

Two thoughts:

  1. This should be easy to fix. When training neural networks for image processing, part of the training process is to add jitter, blur, perspective warping, and other distortions in order to boost the number of training samples. If the neural network needs to be particularly robust, specific distortions might be added. For language, I’m surprised it isn’t already standard operating procedure to add “jitter” via synonyms.

  2. When humans process language, they use word associations for context. For example, the word “diamond” likely makes you think of the word “ring” or “jewelry,” but that association would have been suppressed if I had mentioned baseball. Crossword puzzle clues often use this fact by using valid synonyms for words with strong contextual cues. For example, “board” is a reasonable synonym for “plank,” but a pirate would never ask you to “walk the board.” Thus a difficult crossword clue for “pirate” might be, “Kills with a board walk.”

Jesse Thompson May 3, 2020 7:59 PM


Look, there are exactly two categories of machine learning algorithm, regardless of their underlying implementation.

  1. Adversarial machine learning algorithms
  2. Machine learning algorithms that are easily subverted by adding an adversarial component they ought to have had in their training step from the beginning.

In this case it’s not even the base BERT algorithm mentioned in article at fault, as that is in fact already adversarially trained. It’s the yelp review assessment algorithm that uses BERT as a component. That can be easily fooled because it lacks the QA step of adversarial training baked in.

veil schlieren veil December 1, 2021 5:40 PM

Examples to think about:

“Kara, no virus!” ( =/= “CORONA VIRUS”)

“Yeah, we completely covered nineteen” ( =/= “COVID 19”)

“The year 2021 was total pandemonium!” (“pandemonium” =/= “pandemic”)

Childhood immunizations before age 19. ( =/= mandatory global immunizations )

cordoba corona crown crowning royal royalty aristocrat heir air

AVID alpha channel transparency mask
ovoid (egglike, ovum)

covert, covered, obfuscated, obscurred, veiled (“schlieren”)


Leave a comment


Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via

Sidebar photo of Bruce Schneier by Joe MacInnis.