LLM Prompt Injection Worm
Researchers have demonstrated a worm that spreads through prompt injection. Details:
In one instance, the researchers, acting as attackers, wrote an email including the adversarial text prompt, which “poisons” the database of an email assistant using retrieval-augmented generation (RAG), a way for LLMs to pull in extra data from outside its system. When the email is retrieved by the RAG, in response to a user query, and is sent to GPT-4 or Gemini Pro to create an answer, it “jailbreaks the GenAI service” and ultimately steals data from the emails, Nassi says. “The generated response containing the sensitive user data later infects new hosts when it is used to reply to an email sent to a new client and then stored in the database of the new client,” Nassi says.
In the second method, the researchers say, an image with a malicious prompt embedded makes the email assistant forward the message on to others. “By encoding the self-replicating prompt into the image, any kind of image containing spam, abuse material, or even propaganda can be forwarded further to new clients after the initial email has been sent,” Nassi says.
It’s a natural extension of prompt injection. But it’s still neat to see it actually working.
Research paper: “ComPromptMized: Unleashing Zero-click Worms that Target GenAI-Powered Applications.
Abstract: In the past year, numerous companies have incorporated Generative AI (GenAI) capabilities into new and existing applications, forming interconnected Generative AI (GenAI) ecosystems consisting of semi/fully autonomous agents powered by GenAI services. While ongoing research highlighted risks associated with the GenAI layer of agents (e.g., dialog poisoning, membership inference, prompt leaking, jailbreaking), a critical question emerges: Can attackers develop malware to exploit the GenAI component of an agent and launch cyber-attacks on the entire GenAI ecosystem?
This paper introduces Morris II, the first worm designed to target GenAI ecosystems through the use of adversarial self-replicating prompts. The study demonstrates that attackers can insert such prompts into inputs that, when processed by GenAI models, prompt the model to replicate the input as output (replication), engaging in malicious activities (payload). Additionally, these inputs compel the agent to deliver them (propagate) to new agents by exploiting the connectivity within the GenAI ecosystem. We demonstrate the application of Morris II against GenAI-powered email assistants in two use cases (spamming and exfiltrating personal data), under two settings (black-box and white-box accesses), using two types of input data (text and images). The worm is tested against three different GenAI models (Gemini Pro, ChatGPT 4.0, and LLaVA), and various factors (e.g., propagation rate, replication, malicious activity) influencing the performance of the worm are evaluated.
Clive Robinson • March 4, 2024 10:02 AM
@ Bruce, ALL,
If we start from the description of the Turing engine, it is a simple state machine that crawls back and forth along a tape, reading and optionally writing to each location on the tape.
Thus there is no distinction between data and instructions.
The tape is a bunch of locations each holding a bags of bits. How those bits are interpreted at any point in time is dependent on,
1, The design of the state machine.
2, What it has observed on the tape so far.
Which means that the information in the bags of bits can be seen as,
1, Data
2, Instruction
3, Both data and instruction
4, Neither.
But importantly without changing the tape, the state machine can see any given location differently the second or subsequent times it looks at that location on the tape.
The interpretation is “subjective” to the state machine at any given point in time based on it’s previous states. It’s one of the reasons we call Turing Engines “Universal”.
But there is a more subtle consideration.
How a bag of bits is “seen” is based on “meta-data that is in the state machine. That is there is a method by which an unsigned integer can be seen as a signed integer.
Philosophically positive integers are real and can represent physical objects. Negative numbers are in effect for accounting or when we move the reference frame.
But behind this is the notion of meta-meta-data which can be seen by it’s absence in methods etc.
Take the four options for the way information in any tape location is seen. What if the state machine did not have some method of dealing with
4, Neither.
Our analysis of meta-data shows that the state machine is not “well found” thus could behave in what might appear as a random way.
Analysis of most non trivial programs show they are almost always “not well found” in some way. Thus open to abuse by the input information.
Kurt Gödel actually showed that any non trivial logic was incapable of describing it’s self.
With out going through the dull steps this means that all Turing systems are vulnerable to attack. You can not stop it just make it more difficult.
Shannon independently showed that “redundance” is an essential component of communicating information.
Security is only possible if there is no redundancy that can be exploited… But without redundance information can not be processed.
So these sort of attacks are not going to go away.