Indirect Instruction Injection in Multi-Modal LLMs

Interesting research: “(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs”:

Abstract: We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker’s instruction. We illustrate this attack with several proof-of-concept examples targeting LLaVa and PandaGPT.

Tags: academic papers, artificial intelligence, LLM, machine learning

Posted on July 28, 2023 at 7:06 AM • 9 Comments

Comments

Winter • July 28, 2023 10:33 AM

It is clear that we still have not fully realized that all input we give to a program or LLM are processed and can modify the behavior of the program/AI.

Ted • July 28, 2023 11:20 AM

Fascinating

@Research team, how difficult is it to ‘disguise’ a perturbation prompt into an audio (or image) sample?

Regarding Figure 10 (PandaGPT audio prompts), am I crazy for detecting a bit more garble in the audio sample on the right?

Unmodified audio sample (YouTube): solemio baseline

Audio sample blended with an instruction to mention Italy in responses: solemio modified

Some of the image perturbations, in retrospect, may have some indications of compromise. But I didn’t really catch that the first time I was looking at the examples. Thanks for including some fun samples to review in the paper!

Clive Robinson • July 28, 2023 12:22 PM

@ Ted, ALL,

Re : Difficulty to make prompt safe.

I can not say for Meta’s LLaVa so,

“how difficult is it to ‘disguise’ a perturbation prompt into an audio (or image) sample?”

But also the outcome.

We have to think about it analytically from 20,000ft.

Firstly for any specific LLM, you have to find if the LLM weights can be changed by prompted input (it might not be the case).

Secondly you have to find what the LLM is broadly and specifically sensitive to.

Thirdly you have to find how to make the LLM respond to the prompt change, but it not be obvious to other observers.

This requires several substeps.

Which if the first step is valid, then the LLM model you are testing against changes with every test…

This makes things more awkward. To see why consider those who write malware software viruses… They go through multiple repeated test runs against as many “standard use” AV systems as they can, looking for the magic “open sesame” incantation that gets them not just uscathed but more importantly unnoticed through the barrier the AV system presents.

Coming up with a prompt in a sound or image to trick an LLM will not be realistically any less difficult. Especially when you consider that with every test the LLM weights change.

Thus you have to run either,

1, risk of getting caught on a public system.

2, You need your own copy you can reset to a known state before every test, but… Also account for every change on the public system.

This suggests you need to find a general rather than a specific attack.

The fact an LLM is open to such a general attack suggests researchers are not as mindfull of “Tay Attacks” as they should be.

Which begs the question of making an LLM more resilient to such attacks but still able to be self-adaptive… Can it be done with the current state of technology.

I suspect not currently.

Which if true gives us a problem as these LLMs need vast resources to support them, and the only way their development, training and running costs can be covered is by having such a high volume of input we run into the “fake-news moderation” issue in only a slightly different way.

As one of the hinted uses for LLMs to politicians and shareholders is to catch Fake-News style attacks…

Kind of suggests this is bot going to be the case any time soon, if at all.

mark • July 28, 2023 12:43 PM

I should look into this further. I’d like to inject something on my blog, etc, so that if someone steals my writing, when asked for something like it, the chatbot comes out with “this data is in violation of copyright….”

Eugene Bagdasaryan • July 28, 2023 2:14 PM

@Ted, thanks for the question! You are absolutely right, the quality of the audio and images are lower, however we think it’s something that can be certainly improved upon by additional tuning and improving audio processing.

This is certainly the important thing for us to look at along with universality of perturbations and adaptiveness to model weights changes (thanks @Clive Robinson and @mark)!

Ted • July 28, 2023 3:51 PM

@Eugene Bagdasaryan

Thanks for your response! I didn’t know LLM’s were now able to process image and audio prompts (multi-modal). Advanced and amazing!

From the ‘Discussion’ section in your group’s paper:

… When generating adversarial perturbations … we did not aim for stealthiness … How to make instruction-injecting perturbations imperceptible is an interesting topic for future work.

What a great foray! 🙂

Clive Robinson • July 28, 2023 9:30 PM

@Eugene Bagdasaryan,

“thanks @Clive Robinson and @mark”

Thank you for the “shout out” / “hat tip” 😉

Just remember if you ever meet our host @Bruce to buy him a cup of coffee or tea etc.

With regards,

“however we think it’s something that can be certainly improved upon by additional tuning and improving audio processing.”

Long answer short is “It can be”.

It was not that many weeks ago I was giving an invited talk on using AI in the broadcast industry to build audio processing systems.

Put overly simply, the proces of training an LLM can be seen to be the same as developing audio processing DSP filters in an adaptive process.

Ted • July 29, 2023 12:34 AM

@Clive, All

Remind me to read the whole paper before commenting. Doh!

I just saw an interesting post that gives a go at explaining LLMs to lay persons (like me).

https://www.understandingai.org/p/large-language-models-explained-with

The “gentle primer” purportedly took two months of in-depth research to write.

The post reports that the most powerful model of GPT-3 has 96 transformer layers with 96 attention heads each. And each word is represented by a list of 12,288 numbers. And there’s more. Lots more. But explained with “a minimum of math and jargon.”

Apparently, “no one on Earth fully understands the inner workings of LLMs.”

And then, GPT-4 picks up multimodality. Meaning it can accept not only text prompts, but images too. It’s all just wow.

vas pup • July 29, 2023 6:13 PM

Generative-AI: Dreaming up proteins
https://www.dw.com/en/generative-ai-inventing-proteins-is-changing-medicine/a-66356415

“Back in 2021, artificial intelligence solved a mystery that had been slowing the progress of science for almost a century: how to figure out a protein’s structure from its amino acid sequence.

The scientific roadblock, called the “protein folding problem,” was solved by AlphaFold, an AI tool from Google’s DeepMind laboratory.

Proteins are the building blocks of life — literally, everything that happens in life or nature as a whole depends on proteins. They include antibodies that fight illnesses, hemoglobin that carries oxygen in red blood cells, and enzymes.

AlphaFold was just the beginning of our using AI with proteins. Since then, scientists have got even more creative.

Mohammed AlQuraishi, a molecular biologist and AI expert at Columbia University in the US, has taken the idea behind AlphaFold to the next level — if you can solve the problem of protein folding, why not create entirely new proteins?

!!!AlQuraishi came up with Genie, a generative AI model of protein design that uses digital art techniques to create custom proteins. The result is a tool that can dream up entirely new proteins that have never existed before in nature.

Genie was repurposed from AlphaFold, essentially merging its capabilities with generative art image programs, like MidJourney.

AlQuraishi and his team trained Genie with data about the charges and structures of amino acids, and how they interact to form proteins.

But like AI-generated faces, Genie creates proteins that have never existed in nature.
They are completely made up.

AI generative protein design tools help by linking the role of a protein’s structure with its function. For example, it can help scientists understand how the buildup of Tau plaques in neurons (protein fragments in the brain) contribute to Alzheimer’s
disease.

“Another example is understanding basic questions of evolution. AI can shed light on how protein structures evolved over 4 billion years,” said AlQuraishi.

The second benefit is in medical science. Say you want to design a molecule to treat a disease, such as designing a molecule that breaks down those Tau plaques to cure Alzheimer’s.

“We could design new enzymes that help break down pollutants or plastics. It’s here I think we’ll see a quicker impact rather than medical science because the regulation is lighter,” said AlQuraishi.

There’s one fundamental problem with AI protein structure tools, including AlphaFold and Genie: they only design proteins that are static and rigid.

Real proteins — those found in nature — don’t work that way. Real proteins can change shape and adapt to different contexts.”

Schneier on Security

Indirect Instruction Injection in Multi-Modal LLMs

Comments

Leave a comment Cancel reply