Indirect Prompt Injection Attacks Against LLM Assistants

Really good research on practical attacks against LLM agents.

“Invitation Is All You Need! Promptware Attacks Against LLM-Powered Assistants in Production Are Practical and Dangerous”

Abstract: The growing integration of LLMs into applications has introduced new security risks, notably known as Promptware—maliciously engineered prompts designed to manipulate LLMs to compromise the CIA triad of these applications. While prior research warned about a potential shift in the threat landscape for LLM-powered applications, the risk posed by Promptware is frequently perceived as low. In this paper, we investigate the risk Promptware poses to users of Gemini-powered assistants (web application, mobile application, and Google Assistant). We propose a novel Threat Analysis and Risk Assessment (TARA) framework to assess Promptware risks for end users. Our analysis focuses on a new variant of Promptware called Targeted Promptware Attacks, which leverage indirect prompt injection via common user interactions such as emails, calendar invitations, and shared documents. We demonstrate 14 attack scenarios applied against Gemini-powered assistants across five identified threat classes: Short-term Context Poisoning, Permanent Memory Poisoning, Tool Misuse, Automatic Agent Invocation, and Automatic App Invocation. These attacks highlight both digital and physical consequences, including spamming, phishing, disinformation campaigns, data exfiltration, unapproved user video streaming, and control of home automation devices. We reveal Promptware’s potential for on-device lateral movement, escaping the boundaries of the LLM-powered application, to trigger malicious actions using a device’s applications. Our TARA reveals that 73% of the analyzed threats pose High-Critical risk to end users. We discuss mitigations and reassess the risk (in response to deployed mitigations) and show that the risk could be reduced significantly to Very Low-Medium. We disclosed our findings to Google, which deployed dedicated mitigations.

Defcon talk. News articles on the research.

Prompt injection isn’t just a minor security problem we need to deal with. It’s a fundamental property of current LLM technology. The systems have no ability to separate trusted commands from untrusted data, and there are an infinite number of prompt injection attacks with no way to block them as a class. We need some new fundamental science of LLMs before we can solve this.

Tags: academic papers, AI, cyberattack, LLM, threat models

Posted on September 3, 2025 at 7:00 AM • 9 Comments

Comments

Anonymous • September 3, 2025 7:50 AM

Here is the website of the study: https://sites.google.com/view/invitation-is-all-you-need/home

jbmartin6 • September 3, 2025 8:56 AM

I would just call it remote code execution.

KC • September 3, 2025 9:04 AM

In the Nassi, Cohen, Yair paper – in the discussion – they alert of at least two more Promptware variants. Of what I suppose are an infinite number. I’d/ll need to look more closely at their research. But are they saying these are still unmitigated attack vectors?

(1) like one where Apple Intelligence summarizes an incoming message

(2) digital mines placed in YouTube and Google Maps

Anonymous • September 3, 2025 10:17 AM

To KC, Plenty of attack vectors are still open and waiting to be discovered 🙂

Clive Robinson • September 3, 2025 10:37 AM

@ Bruce, ALL,

With regards

“Prompt injection isn’t just a minor security problem we need to deal with. It’s a fundamental property of current LLM technology.”

It’s not just Current AI LLM systems.

As I’ve indicated in the past it applies to all systems –not just electronic– that can and do “interpret input” and have a degree of complexity.

Because as noted,

“The systems have no ability to separate trusted commands from untrusted data, and there are an infinite number of prompt injection attacks with no way to block them as a class.”

Is not quite true due to both,

1, Feed back paths,
2, Universal Storage,
3, Ability to update functioning.

So consider a “state machine” of minimal function and hard coded interpretation. It lacks some of the necessary function.

In the simplest of cases –say a lift and it’s buttons– all states and their interactions can be mapped and constrained. By what is in effect simple sequence logic.

That said except in highly constrained environments, such systems tend to be of limited functionality.

But care has to be taken… Either the NAND or NOR gate are considered “universal gates” from which all other logic gates and logical functions within limits can be built.

Therefore at some point the level of complexity becomes capable of being a Universal Turing Engine.

The limits were known before electromechanical or electronic computers were built. Due to the work of Kurt Gödel, Alonzo Church, Alan Turing, etc in the late 1920’s and early 1930’s. And importantly the work of others that provided the shoulders on which they stood.

Which is why I am cautious about,

” We need some new fundamental science of LLMs before we can solve this.”

I’m not sure it is possible under the limits of the fundamentals of “Information Theory”.

A little later in time Claude Shannon based on the work of others like Ralph Hartley, Harry Nyquist, and Sir Edmund Whittaker, in effect founded “Information Theory”.

An important part of which are “Shannon Channels” by which information is sent from a transmission source to a receiving sink. Such channels have bandwidth and energy/distance limitations snd most importantly what is called “noise” that is equated to a random signal source that represents measurement / quantification limitations.

Importantly as part of characterising noise, Shannon showed that in order to be able to send information there has to be “randomness” and “redundancy” not just in the channel but the representation / data format used in it.

Some decades later Gus Simmons showed that due to the redundancy you can “always create” a Shannon channel in an existing Shannon Channel. Thus establishing a “Turtles all the way down” issue.

But Simmons also revealed that such channels suffer from the Observer Problem.

That is in a two party communication the two parties can use the redundancy to implement “information hiding”. Think of it as a variation of Shannon’s “Perfect Secrecy” by which the One Time Pad works. But also an observer can not discern the real meaning of a communication, at best only an “apparent meaning” based on previously observed communications and the ability to correlate them with observable events.

If a communication is only used once, there can be no “future correlation” to predict events etc. Thus whilst a third party may witness a communication the meaning they can ascribe is minimal at best.

Thus it will always be possible for a first party to collude with a second party.

So it will always be possible for a Simmons channel to be set up by some First Party wishing to do “Prompt injection” on the Second Party LLM. But importantly unseen by a Third Party Observer that only has “partial information” of previous input from all sources to the ML used to train the LLM and set the DNN criteria…

anon • September 3, 2025 12:56 PM

I would call it a failure to sanitize input, after all if you expect an interface to be public-facing, you’d be criminally foolish not to.
Is this simply Bobby Tables’ neice?

Anonymous • September 3, 2025 3:48 PM

To KC,

(1) This is mostly in the context of pure 0-clicks attacks, as in Apple Intelligence, the LLM inference is triggered automatically.

(2) This is a different variant that broadcasts promptware to anyone instead of the targeted variant discussed in the paper

not important • September 3, 2025 4:47 PM

https://www.yahoo.com/news/articles/researchers-used-persuasion-techniques-manipulate-
175557923.html

=University of Pennsylvania researchers persuaded ChatGPT to either call a researcher a “jerk” or provide instructions on how to synthesize the legal drug lidocaine. Overall,
the LLM, GPT-4o Mini, appeared to be susceptible to the persuasion tactics that also work on humans. Researchers found AI systems “mirror human responses”

Despite predictions AI will someday harbor superhuman intelligence, for now it seems to
be just as prone to psychological tricks as humans are, according to a study.

Using seven persuasion principles (authority, commitment, liking, reciprocity,
scarcity, social proof, and unity) explored by psychologist Robert Cialdini in his book
Influence: The Psychology of Persuasion, University of Pennsylvania researchers
dramatically increased GPT-4o Mini’s propensity to break its own rules by either
insulting the researcher or providing instructions for synthesizing a regulated drug: lidocaine.

“Although AI systems lack human consciousness and subjective experience, they demonstrably mirror human responses,” the researchers concluded in the study.

understanding AI’s parahuman capabilities—or how it acts in ways that mimic human
motivation and behavior—is important for both revealing how it could be manipulated by bad actors and how it can be better prompted by those who use the tech for good.=

KC • September 3, 2025 5:10 PM

@Anonymous, ALL

Good clarification on the more sophisticated
Promptware variants that could emerge.

Curious also about this from the paper:

We introduce a new TARA framework, adapting ISO/SAE 21434 for automotive cybersecurity, to assess cybersecurity risks to users of LLM-powered assistants.

We hope that “Invitation Is All You Need”
will be the wake-up call needed to shift the industry perception on LLM security just as the 2015 remote attack on a Jeep Cherokee [20] and the two S&P and USENIX
Sec’ papers [9, 17] fundamentally shifted the perception on connected car security. This is critical considering the safety
implications involved in the expected integration of LLMs into autonomous vehicles and humanoids

As part of the TARA process, there’s an opportunity to give an impact score for four factors: financial, operational, safety, and privacy.

They are categorized as negligible, minor, moderate, severe, and critical.

(more in paper at Section 3.2.1. Impact Score Calculation)

I guess the threat impact score is determined by the highest score in any one of the four factors.

Wondering if everyone would agree on the equal weighting of factors. I mean I’m fine with it. Just wondering if there has been debate.

Schneier on Security

Indirect Prompt Injection Attacks Against LLM Assistants

Comments

Leave a comment Cancel reply