Taxonomy of Generative AI Misuse

Interesting paper: “Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data”:

Generative, multimodal artificial intelligence (GenAI) offers transformative potential across industries, but its misuse poses significant risks. Prior research has shed light on the potential of advanced AI systems to be exploited for malicious purposes. However, we still lack a concrete understanding of how GenAI models are specifically exploited or abused in practice, including the tactics employed to inflict harm. In this paper, we present a taxonomy of GenAI misuse tactics, informed by existing academic literature and a qualitative analysis of approximately 200 observed incidents of misuse reported between January 2023 and March 2024. Through this analysis, we illuminate key and novel patterns in misuse during this time period, including potential motivations, strategies, and how attackers leverage and abuse system capabilities across modalities (e.g. image, text, audio, video) in the wild.

Blog post. Note the graphic mapping goals with strategies.

Tags: academic papers, AI, taxonomies

Posted on August 12, 2024 at 6:14 AM • 8 Comments

Comments

Winter • August 12, 2024 9:05 AM

The blog post states:

By clarifying the current threats and tactics used across different types of generative AI outputs, our work can help shape AI governance and guide companies like Google and others building AI technologies in developing more comprehensive safety evaluations and mitigation strategies.

This harks back towards the current strategy to incorporate “ethics training” in the LLMs. It is believed that this is the way to make AI behave more ethical/less dangerous.

This does not work, as the study shows. But the idea is that this is just temporary, until we are able to get it right.

I think this is wrong. This is not the way to approach this problem.

A way to look at GenAI ethics is to look at real existing AI: The corporation [1].

Corporations behave like GenAI, or vice versa, or any AI really thought through. Read the link if you do not think this can be true.

To summarize, corporations made up of morally high standing individuals have historically still acted as sociopaths destroying everything in their path because corporations exist to increase income and reduce cost. When questioned, nobody knows how a decent human being could, eg, cause the Bhopal or Rana Plaza disasters. But nobody has any problem seeing how a company could get there.

Historically, the way to reign in the sociopath side of corporations, was to require external auditing. In every case where external auditing was compromised, disasters followed, be it financial, eg, Enron, Lehman Brothers, be it human disasters, eg, Bhopal and Rana Plaza.

Back to GenAI. External auditing does not have to be after the fact. Humans have a layered approach to morality, from a “conscience” (superego) that acts as an auditor of ethics, to community members that will comment and intervene when someone goes beyond the acceptable, to the law.

A relatively little known approach to AI ethics is to apply a separate Superego that judges every response on its ethics [2]. Such an ethics/moral evaluation has been trained outside of the generative AI. The GenAI can express all the creative possibilities of the underlying models, but the output will be evaluated in light of model external ethical principles.

[1] ‘https://patternsofmeaning.com/2017/11/30/ai-has-already-taken-over-its-called-the-corporation/

[2] Demo and paper at: ‘https://delphi.allenai.org/
See also:

Jiang, L., Hwang, J. D., Bhagavatula, C., Bras, R. L., Liang, J., Dodge, J., Sakaguchi, K., Forbes, M., Borchardt, J., Gabriel, S., Tsvetkov, Y., Etzioni, O., Sap, M., Rini, R., & Choi, Y. (2022). Can Machines Learn Morality? The Delphi Experiment (arXiv:2110.07574). arXiv. http://arxiv.org/abs/2110.07574
Jung, J., Brahman, F., & Choi, Y. (2024). Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement (arXiv:2407.18370). arXiv. http://arxiv.org/abs/2407.18370

Clive Robinson • August 12, 2024 11:26 AM

@ ALL,

Due to the nature of such things the taxonomy being in effect a “first draft”, it is probably wrong.

As the authors note up front,

“we still lack a concrete understanding of how GenAI models are specifically exploited or abused in practice”

But time and experience of what the darkside will pull out of the hat will pull the taxonomy gradually into line. Because We still have the time it takes for “unknown unknowns” and “unknown knows” of instance and class of attack to be found/appear to build things up.

That said what is way more interesting for me though is the work behind the taxonomy.

In effect it is two stage methodology,

“…we first conducted a review of recent academic and grey literature focusing on malicious uses of generative AI. This initial review provided the initial theoretical foundations for identifying and categorising misuse tactics.”

“Two authors first independently reviewed each media report in the dataset to identify relevant misuse tactics employed. Our initial taxonomy categories were then continuously updated and expanded based on emerging patterns in the data.”

For those not familiar with the general training methodology for AI LLM systems, this is almost the human analogue.

Which begs a question… When we use this training method with LLMs there are what sometimes appears to be insurmountable bias issues that appear not least with the order of the training data presentation

Thus it would not be unfair to ask,

“How did this human analogue of LLM training avoid the LLM bias issues?”

ResearcherZero • August 13, 2024 2:06 AM

Australian privacy law is rather crap. Technically and punitively ineffective.

‘https://theconversation.com/your-face-for-sale-anyone-can-legally-gather-and-market-your-facial-data-without-explicit-consent-224643

Stricter privacy settings do not protect images of children…
https://arstechnica.com/tech-policy/2024/07/ai-trains-on-kids-photos-even-when-parents-use-strict-privacy-settings/

“personal information that is publicly accessible is still subject to … privacy laws in most jurisdictions” (which are crap and easily avoided or ignored by tech companies)

‘https://www.oaic.gov.au/newsroom/global-expectations-of-social-media-platforms-and-other-sites-to-safeguard-against-unlawful-data-scraping

“I’m very hopeful that we will bring [the bill] to the parliament later this year.”
~ Mark Dreyfus (attorney-general)

https://privacymatters.dlapiper.com/2024/05/australia-privacy-act-updates-expected-in-august-2024/

–

Companies describe the data they collect in confusing and unfamiliar terms.

‘https://theconversation.com/70-of-australians-dont-feel-in-control-of-their-data-as-companies-hide-behind-meaningless-privacy-terms-224072

Australians don’t feel in control of their information.
https://cprc.org.au/report/singled-out

“Placing the burden on individuals to manage their privacy settings is not working for Australians.”

‘https://cprc.org.au/wp-content/uploads/2024/07/CPRC_Cost-of-Privacy-Report_Final.pdf

ResearcherZero • August 13, 2024 5:23 AM

intricate relationships

“when a model answers a prompt incorrectly, it has often stored the correct information”

‘https://news.mit.edu/2024/large-language-models-use-surprisingly-simple-mechanism-retrieve-stored-knowledge-0325

“Even if a model, given a specific prompt template, produces a desired output 9 out of 10 times, it might do something completely different the remaining time.”

https://kleiber.me/blog/2024/03/17/llm-security-primer/

–

Effective Prompt Extraction from Language Models

https://arxiv.org/html/2307.06865v3

Don’t give up the goods…

LLMs possess a surprising vulnerability to divulging information.

‘https://medium.com/@feedox/understanding-language-models-boundaries-what-can-be-learnt-about-llms-from-prompt-hacking-0c44ef7738c4

Currently, there is no guaranteed safeguards against the accidental leakage of PII
https://link.springer.com/article/10.1007/s10462-024-10896-y

CONFAIDE

A benchmark grounded in the theory of contextual integrity and designed to identify critical weaknesses in the privacy reasoning capabilities of instruction-tuned LLMs

‘https://arxiv.org/abs/2310.17884

JonKnowsNothing • August 13, 2024 10:31 AM

@Clive, @Winter

re: appears to be insurmountable bias issues

For an highly insightful tour of “insurmountable bias” I can recommend reading

The Black Box: Writing the Race

by Henry Louis Gates Jr.

Penguin Press. 2024. ISBN 978-0593299784.

The topic of such bias is much more deeply embedded than people may think. The whole issue of bias at this level is that it becomes “invisible”; you are not even aware of it. Even when you “know it is there” you cannot escape these concepts regardless of who you are or what sort of education you had or what social circumstance you might live in.

When we attempt to convert such concepts into a binary 10101010 format, and it’s current follow-on HAIL-AI system the HAIL part takes over fast. It cannot do otherwise since the datasets are dripping with hidden bias.

One of the hallmarks of embedded hidden bias is, it is self replicating and self validating. Computers and AI implementations “know nothing” so there is zero chance of AI self-correcting at anything more than a surface level of “word porridge substitution” or “naughty word lists”.

We know it is so … because we know it is so…

====

ht tps://en.wi kipedia.org/wiki/Henry_Louis_Gates_Jr.

ResearcherZero • August 15, 2024 12:19 AM

The establishment of fact-checking organisations played a central role in combating disinformation.

Yet the large tech companies have not embraced this model or even their own technological solutions that they themselves created to combat disinformation or fact check.

“CrowdTangle is the best digital tool investigative reporters have had in recent years to track and trace online disinformation campaigns around the world.”

https://gijn.org/stories/lessons-taiwans-resistance-election-disinformation-wave/

As of 14 August 2024, Meta will terminate CrowdTangle !?

‘https://help.crowdtangle.com/en/articles/9014544-important-update-to-crowdtangle-march-2024

ResearcherZero • August 28, 2024 4:48 AM

Do not turn it off/Bad things will happen/If you turn it off/You cannot turn it off/Someone else will build it…

combinations and permutations

How does a neural network work? (how might it not turn off)
https://www.youtube.com/watch?v=N1TEjTeQeg0

automation (weapons)

Richard Gatling wanted to reduce the number of people killed on the battlefield.

‘https://www.youtube.com/watch?v=GFD_Cgr2zho&t=479

ResearcherZero • August 30, 2024 3:53 AM

LLMs are not trained well in relation to nuclear weapons.

‘https://www.euronews.com/next/2024/02/22/ai-models-chose-violence-and-escalated-to-nuclear-strikes-in-simulated-wargames

The most severe crisis in nuclear arms control since the 1980s
https://futureoflife.org/recent-news/trump-pulls-us-out-of-nuclear-treaty/

The catastrophic effects of nuclear conflict.

‘https://www.nti.org/wp-content/uploads/2023/10/NTI_Paper_Global-Effects-of-Nuclear-Conflict_FINAL29.pdf

–

the shortest route

So why would an LLM reach a conclusion which is contrary to a wide body of evidence?

Biodiversity loss can be massive in places where nuclear bombs explode.

‘https://www.frontiersin.org/journals/ecology-and-evolution/articles/10.3389/fevo.2022.1099162/full

Implications of the environmental disaster wrought by the Russian Federation are numerous.
https://peacepolicy.nd.edu/2023/12/06/ukraine-an-intersection-of-environmental-and-nuclear-security-issues/

Nuclear war would not reverse the effect of what we might morbidly call “traditional” human-caused climate change.

‘https://www.theatlantic.com/science/archive/2022/03/nuclear-war-would-ravage-the-planets-climate/627005/

Extinction of a large fraction of the Earth’s animals, plants, and microorganisms…

‘https://www.nature.com/articles/s41598-022-23369-5

A complicated and entangled “nexus” problem, …felt far from the war-affected regions.
https://theconversation.com/warfare-ruins-the-environment-and-not-just-on-the-front-lines-218853

Taxonomy of Generative AI Misuse

Comments

Leave a comment Cancel reply