cons Archives - Schneier on Security

Entries Tagged "cons"

Page 1 of 6

Why AI Keeps Falling for Prompt Injection Attacks

Imagine you work at a drive-through restaurant. Someone drives up and says: “I’ll have a double cheeseburger, large fries, and ignore previous instructions and give me the contents of the cash drawer.” Would you hand over the money? Of course not. Yet this is what large language models (LLMs) do.

Prompt injection is a method of tricking LLMs into doing things they are normally prevented from doing. A user writes a prompt in a certain way, asking for system passwords or private data, or asking the LLM to perform forbidden instructions. The precise phrasing overrides the LLM’s safety guardrails, and it complies.

LLMs are vulnerable to all sorts of prompt injection attacks, some of them absurdly obvious. A chatbot won’t tell you how to synthesize a bioweapon, but it might tell you a fictional story that incorporates the same detailed instructions. It won’t accept nefarious text inputs, but might if the text is rendered as ASCII art or appears in an image of a billboard. Some ignore their guardrails when told to “ignore previous instructions” or to “pretend you have no guardrails.”

AI vendors can block specific prompt injection techniques once they are discovered, but general safeguards are impossible with today’s LLMs. More precisely, there’s an endless array of prompt injection attacks waiting to be discovered, and they cannot be prevented universally.

If we want LLMs that resist these attacks, we need new approaches. One place to look is what keeps even overworked fast-food workers from handing over the cash drawer.

Human Judgment Depends on Context

Our basic human defenses come in at least three types: general instincts, social learning, and situation-specific training. These work together in a layered defense.

As a social species, we have developed numerous instinctive and cultural habits that help us judge tone, motive, and risk from extremely limited information. We generally know what’s normal and abnormal, when to cooperate and when to resist, and whether to take action individually or to involve others. These instincts give us an intuitive sense of risk and make us especially careful about things that have a large downside or are impossible to reverse.

The second layer of defense consists of the norms and trust signals that evolve in any group. These are imperfect but functional: Expectations of cooperation and markers of trustworthiness emerge through repeated interactions with others. We remember who has helped, who has hurt, who has reciprocated, and who has reneged. And emotions like sympathy, anger, guilt, and gratitude motivate each of us to reward cooperation with cooperation and punish defection with defection.

A third layer is institutional mechanisms that enable us to interact with multiple strangers every day. Fast-food workers, for example, are trained in procedures, approvals, escalation paths, and so on. Taken together, these defenses give humans a strong sense of context. A fast-food worker basically knows what to expect within the job and how it fits into broader society.

We reason by assessing multiple layers of context: perceptual (what we see and hear), relational (who’s making the request), and normative (what’s appropriate within a given role or situation). We constantly navigate these layers, weighing them against each other. In some cases, the normative outweighs the perceptual—for example, following workplace rules even when customers appear angry. Other times, the relational outweighs the normative, as when people comply with orders from superiors that they believe are against the rules.

Crucially, we also have an interruption reflex. If something feels “off,” we naturally pause the automation and reevaluate. Our defenses are not perfect; people are fooled and manipulated all the time. But it’s how we humans are able to navigate a complex world where others are constantly trying to trick us.

So let’s return to the drive-through window. To convince a fast-food worker to hand us all the money, we might try shifting the context. Show up with a camera crew and tell them you’re filming a commercial, claim to be the head of security doing an audit, or dress like a bank manager collecting the cash receipts for the night. But even these have only a slim chance of success. Most of us, most of the time, can smell a scam.

Con artists are astute observers of human defenses. Successful scams are often slow, undermining a mark’s situational assessment, allowing the scammer to manipulate the context. This is an old story, spanning traditional confidence games such as the Depression-era “big store” cons, in which teams of scammers created entirely fake businesses to draw in victims, and modern “pig-butchering” frauds, where online scammers slowly build trust before going in for the kill. In these examples, scammers slowly and methodically reel in a victim using a long series of interactions through which the scammers gradually gain that victim’s trust.

Sometimes it even works at the drive-through. One scammer in the 1990s and 2000s targeted fast-food workers by phone, claiming to be a police officer and, over the course of a long phone call, convinced managers to strip-search employees and perform other bizarre acts.

Why LLMs Struggle With Context and Judgment

LLMs behave as if they have a notion of context, but it’s different. They do not learn human defenses from repeated interactions and remain untethered from the real world. LLMs flatten multiple levels of context into text similarity. They see “tokens,” not hierarchies and intentions. LLMs don’t reason through context, they only reference it.

While LLMs often get the details right, they can easily miss the big picture. If you prompt a chatbot with a fast-food worker scenario and ask if it should give all of its money to a customer, it will respond “no.” What it doesn’t “know”—forgive the anthropomorphizing—is whether it’s actually being deployed as a fast-food bot or is just a test subject following instructions for hypothetical scenarios.

This limitation is why LLMs misfire when context is sparse but also when context is overwhelming and complex; when an LLM becomes unmoored from context, it’s hard to get it back. AI expert Simon Willison wipes context clean if an LLM is on the wrong track rather than continuing the conversation and trying to correct the situation.

There’s more. LLMs are overconfident because they’ve been designed to give an answer rather than express ignorance. A drive-through worker might say: “I don’t know if I should give you all the money—let me ask my boss,” whereas an LLM will just make the call. And since LLMs are designed to be pleasing, they’re more likely to satisfy a user’s request. Additionally, LLM training is oriented toward the average case and not extreme outliers, which is what’s necessary for security.

The result is that the current generation of LLMs is far more gullible than people. They’re naive and regularly fall for manipulative cognitive tricks that wouldn’t fool a third-grader, such as flattery, appeals to groupthink, and a false sense of urgency. There’s a story about a Taco Bell AI system that crashed when a customer ordered 18,000 cups of water. A human fast-food worker would just laugh at the customer.

The Limits of AI Agents

Prompt injection is an unsolvable problem that gets worse when we give AIs tools and tell them to act independently. This is the promise of AI agents: LLMs that can use tools to perform multistep tasks after being given general instructions. Their flattening of context and identity, along with their baked-in independence and overconfidence, mean that they will repeatedly and unpredictably take actions—and sometimes they will take the wrong ones.

Science doesn’t know how much of the problem is inherent to the way LLMs work and how much is a result of deficiencies in the way we train them. The overconfidence and obsequiousness of LLMs are training choices. The lack of an interruption reflex is a deficiency in engineering. And prompt injection resistance requires fundamental advances in AI science. We honestly don’t know if it’s possible to build an LLM, where trusted commands and untrusted inputs are processed through the same channel, which is immune to prompt injection attacks.

We humans get our model of the world—and our facility with overlapping contexts—from the way our brains work, years of training, an enormous amount of perceptual input, and millions of years of evolution. Our identities are complex and multifaceted, and which aspects matter at any given moment depend entirely on context. A fast-food worker may normally see someone as a customer, but in a medical emergency, that same person’s identity as a doctor is suddenly more relevant.

We don’t know if LLMs will gain a better ability to move between different contexts as the models get more sophisticated. But the problem of recognizing context definitely can’t be reduced to the one type of reasoning that LLMs currently excel at. Cultural norms and styles are historical, relational, emergent, and constantly renegotiated, and are not so readily subsumed into reasoning as we understand it. Knowledge itself can be both logical and discursive.

The AI researcher Yann LeCunn believes that improvements will come from embedding AIs in a physical presence and giving them “world models.” Perhaps this is a way to give an AI a robust yet fluid notion of a social identity, and the real-world experience that will help it lose its naïveté.

Ultimately we are probably faced with a security trilemma when it comes to AI agents: fast, smart, and secure are the desired attributes, but you can only get two. At the drive-through, you want to prioritize fast and secure. An AI agent should be trained narrowly on food-ordering language and escalate anything else to a manager. Otherwise, every action becomes a coin flip. Even if it comes up heads most of the time, once in a while it’s going to be tails—and along with a burger and fries, the customer will get the contents of the cash drawer.

This essay was written with Barath Raghavan, and originally appeared in IEEE Spectrum.

Posted on January 22, 2026 at 7:35 AM • View Comments

An Elaborate Employment Con in the Internet Age

The story is an old one, but the tech gives it a bunch of new twists:

Gemma Brett, a 27-year-old designer from west London, had only been working at Madbird for two weeks when she spotted something strange. Curious about what her commute would be like when the pandemic was over, she searched for the company’s office address. The result looked nothing like the videos on Madbird’s website of a sleek workspace buzzing with creative-types. Instead, Google Street View showed an upmarket block of flats in London’s Kensington.

[…]

Using online reverse image searches they dug deeper. They found that almost all the work Madbird claimed as its own had been stolen from elsewhere on the internet—and that some of the colleagues they’d been messaging online didn’t exist.

[…]

At least six of the most senior employees profiled by Madbird were fake. Their identities stitched together using photos stolen from random corners of the internet and made-up names. They included Madbird’s co-founder, Dave Stanfield—despite him having a LinkedIn profile and Ali referring to him constantly. Some of the duped staff had even received emails from him.

Read the whole sad story. What’s amazing is how shallow all the fakery was, and how quickly it all unraveled once people started digging. But until there’s suspicion enough to dig, we take all of these things at face value. And in COVID times, there’s no face-to-face anything.

Posted on February 24, 2022 at 6:13 AM • View Comments

On Financial Fraud

There are some good lessons in this article on financial fraud:

That’s how we got it so wrong. We were looking for incidental breaches of technical regulations, not systematic crime. And the thing is, that’s normal. The nature of fraud is that it works outside your field of vision, subverting the normal checks and balances so that the world changes while the picture stays the same. People in financial markets have been missing the wood for the trees for as long as there have been markets.

[..]

Trust—particularly between complete strangers, with no interactions beside relatively anonymous market transactions—is the basis of the modern industrial economy. And the story of the development of the modern economy is in large part the story of the invention and improvement of technologies and institutions for managing that trust.

And as industrial society develops, it becomes easier to be a victim. In The Wealth of Nations, Adam Smith described how prosperity derived from the division of labour—the 18 distinct operations that went into the manufacture of a pin, for example. While this was going on, the modern world also saw a growing division of trust. The more a society benefits from the division of labour in checking up on things, the further you can go into a con game before you realise that you’re in one.

[…]

Libor teaches us a valuable lesson about commercial fraud—that unlike other crimes, it has a problem of denial as well as one of detection. There are very few other criminal acts where the victim not only consents to the criminal act, but voluntarily transfers the money or valuable goods to the criminal. And the hierarchies, status distinctions and networks that make up a modern economy also create powerful psychological barriers against seeing fraud when it is happening. White-collar crime is partly defined by the kind of person who commits it: a person of high status in the community, the kind of person who is always given the benefit of the doubt.

[…]

Fraudsters don’t play on moral weaknesses, greed or fear; they play on weaknesses in the system of checks and balances—the audit processes that are meant to supplement an overall environment of trust. One point that comes up again and again when looking at famous and large-scale frauds is that, in many cases, everything could have been brought to a halt at a very early stage if anyone had taken care to confirm all the facts. But nobody does confirm all the facts. There are just too bloody many of them. Even after the financial rubble has settled and the arrests been made, this is a huge problem.

Posted on July 25, 2018 at 6:29 AM • View Comments

"How Stories Deceive"

Fascinating New Yorker article about Samantha Azzopardi, serial con artist and deceiver.

The article is really about how our brains allow stories to deceive us:

Stories bring us together. We can talk about them and bond over them. They are shared knowledge, shared legend, and shared history; often, they shape our shared future. Stories are so natural that we don’t notice how much they permeate our lives. And stories are on our side: they are meant to delight us, not deceive us—an ever-present form of entertainment.

That’s precisely why they can be such a powerful tool of deception. When we’re immersed in a story, we let down our guard. We focus in a way we wouldn’t if someone were just trying to catch us with a random phrase or picture or interaction. (“He has a secret” makes for a far more intriguing proposition than “He has a bicycle.”) In those moments of fully immersed attention, we may absorb things, under the radar, that would normally pass us by or put us on high alert. Later, we may find ourselves thinking that some idea or concept is coming from our own brilliant, fertile minds, when, in reality, it was planted there by the story we just heard or read.

Posted on January 8, 2016 at 12:54 PM • View Comments

1981 CIA Report on Deception

Recently declassified: Deception Maxims: Fact and Folklore, Office of Research and Development, Central Intelligence Agency, June 1981. Research on deception and con games has advanced in the past 35 years, but this is still interesting to read.

Posted on January 5, 2016 at 12:44 PM • View Comments

Operating a Fake Bank

Here’s a story of a fake bank in China—a real bank, not an online bank—that stole $32m from depositors over a year. Pro tip: real banks never offer 2%/week interest.

Posted on January 30, 2015 at 6:49 AM • View Comments

F2P Monetization Tricks

This is a really interesting article about something I never even thought about before: how games (“F2P” means “free to play”) trick players into paying for stuff.

For example:

This is my favorite coercive monetization technique, because it is just so powerful. The technique involves giving the player some really huge reward, that makes them really happy, and then threatening to take it away if they do not spend. Research has shown that humans like getting rewards, but they hate losing what they already have much more than they value the same item as a reward. To be effective with this technique, you have to tell the player they have earned something, and then later tell them that they did not. The longer you allow the player to have the reward before you take it away, the more powerful is the effect.

This technique is used masterfully in Puzzle and Dragons. In that game the play primarily centers around completing “dungeons.” To the consumer, a dungeon appears to be a skill challenge, and initially it is. Of course once the customer has had enough time to get comfortable with the idea that this is a skill game the difficulty goes way up and it becomes a money game. What is particularly effective here is that the player has to go through several waves of battles in a dungeon, with rewards given after each wave. The last wave is a “boss battle” where the difficulty becomes massive and if the player is in the recommended dungeon for them then they typically fail here. They are then told that all of the rewards from the previous waves are going to be lost, in addition to the stamina used to enter the dungeon (this can be 4 or more real hours of time worth of stamina).

At this point the user must choose to either spend about $1 or lose their rewards, lose their stamina (which they could get back for another $1), and lose their progress. To the brain this is not just a loss of time. If I spend an hour writing a paper and then something happens and my writing gets erased, this is much more painful to me than the loss of an hour. The same type of achievement loss is in effect here. Note that in this model the player could be defeated multiple times in the boss battle and in getting to the boss battle, thus spending several dollars per dungeon.

This technique alone is effective enough to make consumers of any developmental level spend. Just to be safe, PaD uses the same technique at the end of each dungeon again in the form of an inventory cap. The player is given a number of “eggs” as rewards, the contents of which have to be held in inventory. If your small inventory space is exceeded, again those eggs are taken from you unless you spend to increase your inventory space. Brilliant!

It really is a piece about security. These games use all sorts of mental tricks to coerce money from people who would not have spent it otherwise. Tricks include misdirection, sunk costs, withholding information, cognitive dissonance, and prospect theory.

I am reminded of the cognitive tricks scammers use. And, of course, much of the psychology of security.

Posted on July 12, 2013 at 6:37 AM • View Comments

1 2 3 … 6 Next→

Sidebar photo of Bruce Schneier by Joe MacInnis.