LLM Prompt Injection Worm

Researchers have demonstrated a worm that spreads through prompt injection. Details:

In one instance, the researchers, acting as attackers, wrote an email including the adversarial text prompt, which “poisons” the database of an email assistant using retrieval-augmented generation (RAG), a way for LLMs to pull in extra data from outside its system. When the email is retrieved by the RAG, in response to a user query, and is sent to GPT-4 or Gemini Pro to create an answer, it “jailbreaks the GenAI service” and ultimately steals data from the emails, Nassi says. “The generated response containing the sensitive user data later infects new hosts when it is used to reply to an email sent to a new client and then stored in the database of the new client,” Nassi says.

In the second method, the researchers say, an image with a malicious prompt embedded makes the email assistant forward the message on to others. “By encoding the self-replicating prompt into the image, any kind of image containing spam, abuse material, or even propaganda can be forwarded further to new clients after the initial email has been sent,” Nassi says.

It’s a natural extension of prompt injection. But it’s still neat to see it actually working.

Research paper: “ComPromptMized: Unleashing Zero-click Worms that Target GenAI-Powered Applications.

Abstract: In the past year, numerous companies have incorporated Generative AI (GenAI) capabilities into new and existing applications, forming interconnected Generative AI (GenAI) ecosystems consisting of semi/fully autonomous agents powered by GenAI services. While ongoing research highlighted risks associated with the GenAI layer of agents (e.g., dialog poisoning, membership inference, prompt leaking, jailbreaking), a critical question emerges: Can attackers develop malware to exploit the GenAI component of an agent and launch cyber-attacks on the entire GenAI ecosystem?

This paper introduces Morris II, the first worm designed to target GenAI ecosystems through the use of adversarial self-replicating prompts. The study demonstrates that attackers can insert such prompts into inputs that, when processed by GenAI models, prompt the model to replicate the input as output (replication), engaging in malicious activities (payload). Additionally, these inputs compel the agent to deliver them (propagate) to new agents by exploiting the connectivity within the GenAI ecosystem. We demonstrate the application of Morris II against GenAI-powered email assistants in two use cases (spamming and exfiltrating personal data), under two settings (black-box and white-box accesses), using two types of input data (text and images). The worm is tested against three different GenAI models (Gemini Pro, ChatGPT 4.0, and LLaVA), and various factors (e.g., propagation rate, replication, malicious activity) influencing the performance of the worm are evaluated.

Posted on March 4, 2024 at 7:01 AM6 Comments

Comments

Clive Robinson March 4, 2024 10:02 AM

@ Bruce, ALL,

If we start from the description of the Turing engine, it is a simple state machine that crawls back and forth along a tape, reading and optionally writing to each location on the tape.

Thus there is no distinction between data and instructions.

The tape is a bunch of locations each holding a bags of bits. How those bits are interpreted at any point in time is dependent on,

1, The design of the state machine.
2, What it has observed on the tape so far.

Which means that the information in the bags of bits can be seen as,

1, Data
2, Instruction
3, Both data and instruction
4, Neither.

But importantly without changing the tape, the state machine can see any given location differently the second or subsequent times it looks at that location on the tape.

The interpretation is “subjective” to the state machine at any given point in time based on it’s previous states. It’s one of the reasons we call Turing Engines “Universal”.

But there is a more subtle consideration.

How a bag of bits is “seen” is based on “meta-data that is in the state machine. That is there is a method by which an unsigned integer can be seen as a signed integer.

Philosophically positive integers are real and can represent physical objects. Negative numbers are in effect for accounting or when we move the reference frame.

But behind this is the notion of meta-meta-data which can be seen by it’s absence in methods etc.

Take the four options for the way information in any tape location is seen. What if the state machine did not have some method of dealing with

4, Neither.

Our analysis of meta-data shows that the state machine is not “well found” thus could behave in what might appear as a random way.

Analysis of most non trivial programs show they are almost always “not well found” in some way. Thus open to abuse by the input information.

Kurt Gödel actually showed that any non trivial logic was incapable of describing it’s self.

With out going through the dull steps this means that all Turing systems are vulnerable to attack. You can not stop it just make it more difficult.

Shannon independently showed that “redundance” is an essential component of communicating information.

Security is only possible if there is no redundancy that can be exploited… But without redundance information can not be processed.

So these sort of attacks are not going to go away.

echo March 4, 2024 12:11 PM

This isn’t a technical problem it’s a mindset problem which also flows into practice and implementation. That’s why feminist security theory and feminist queer theory are needed in security at the governance, regulation, and industry levels. It’s called “Don’t be stupid”. Seriously, whoever pushes any of this stuff out needs rugby tackling.

This is no different from ActiveX or any other kind of worm in this repeating cycle decade after decade. Why does this happen? Because the three layers and multiple domains of the security model are gimped. “Why?” you may ask. Because of stupid. Until “The Stupid” is dealt with it will keep happening. The fact a room full of slack-jawed yahoos would even consider not just building stuff like this but embedding it in everything and everywhere is a serious case of The Stupid. I’m just head-desking over this gimmick being called interesting and the roundrobin handwavy distraction. When I look at the amount of effort put into “research” or security consultancy then someone calls the abstract technologies I mentioned for dealing with this “frivolous” because teh womenz it gets annoying.

One solution is fair pay for women. That adds more spending power to the market and less money for billionaire dadbods to go on speculative jollies. It also needs more independent women on boards so there’s more voices to say “Don’t be stupid”.

Celebrating Women in the Cyber Security Ecosystem on #IWD2024.
Hosted By Centre for Cyber Security Innovation.
Aston, Birmingham, UK.
Event starts on Thursday, 14 March 2024.

Women need to do their own stuff and speak up too and start their own security companies. And yes I know the market’s rigged.

It’s also Women’s History Month.

https://www.oii.ox.ac.uk/wp-content/uploads/2021/01/Reconfigure-Report-v6-pages.pdf

Feminist Action Research in Cybersecurity
A report co-authored by the Reconfigure Network

https://gisf.ngo/wp-content/uploads/2012/09/Gender-and-Security.pdf

Gender and Security
Guidelines for Mainstreaming Gender in Security Risk Management
EISF Briefing Paper

ResearcherZero March 5, 2024 6:11 AM

In mathematics and science, a nonlinear system (or a non-linear system) is a system in which the change of the output is not proportional to the change of the input. After long periods of boring and predictable behaviour, these systems suddenly become wildly unpredictable, exhibiting extreme fluctuations.

“The way I describe it to military people is to think about a chess board,” says Guo. “The central four squares of the 64-square grid are critically important. Almost every significant piece will contest them at some point. By creating a human connection network, we found the geographical equivalents of those key squares in various locations around the world.”

‘https://www.turing.ac.uk/about-us/impact/predicting-conflict-year-advance

Predicting Conflict
https://www.pcr.uu.se/research/views

“Even with unusually rich data, however, our models poorly predict new outbreaks or escalations of violence. These “best-case” scenarios with annual data fall short of workable early-warning systems.”

‘https://direct.mit.edu/rest/article-abstract/104/4/764/97753/The-Promise-and-Pitfalls-of-Conflict-Prediction

ensemble prediction

“Potential applications of the model include stress-testing and predicting the effects of changes in monetary, fiscal, or other macroeconomic policies.”

‘https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3484768

“To see all of the globe, you have to rotate it; to see all of our new map, you simply have to flip it over.”
https://www.princeton.edu/news/2021/02/15/princeton-astrophysicists-re-imagine-world-map-designing-less-distorted-radically

bl5q se5N March 5, 2024 8:51 PM

@ ResearcherZero

In mathematics and science, a nonlinear system (or a non-linear system) is a system in which the change of the output is not proportional to the change of the input.

Non-linear is just “not linear”, which “divides the genus (of systems or mappings) by negation”, which tells very little. E.g. lines can be divided into straight lines and not straight lines, but this tells us essentially nothing about the non straight lines.

To have insight into non-linear systems, one has to actually classify the systems somehow, that is decompose the genus into species of some kind according to some sort of differentia. For example, René Thom’s treatise “Structural Stability and Morphogenesis” is one attempt at classification.

Some kind of classification effiort needs to be made for the sorts of systems appearing in the conflict research.

https://en.wikipedia.org/wiki/Ren%C3%A9_Thom

ResearcherZero March 6, 2024 3:33 AM

@bl5q se5N

Essentially that is where experienced humans can use information to then derive knowledge.

Even when some of our models do not work, we can source information from other locations.
Sometimes we can gather it using our large intelligence apparatus. It then sits on a desk for a short period of time, is ignored at secured presentations, and is placed in a vault.

Modelling did predict the invasion of Ukraine by a year prior, but intelligence informed this potential decades earlier (plans from Putin’s desk etc), but this might as well have been aliens from outer space, oddly named asteroids, or pictures of fairies.

It’s similar in fashion to explaining signal strength or wave propagation to a layman.

Even when we have reliable models it is very difficult to get anyone to pay attention.

‘https://www.darkreading.com/application-security/hugging-face-ai-platform-100-malicious-code-execution-models

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.