Prompt Injection Through Poetry

In a new paper, “Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models,” researchers found that turning LLM prompts into poetry resulted in jailbreaking the models:

Abstract: We present evidence that adversarial poetry functions as a universal single-turn jailbreak technique for Large Language Models (LLMs). Across 25 frontier proprietary and open-weight models, curated poetic prompts yielded high attack-success rates (ASR), with some providers exceeding 90%. Mapping prompts to MLCommons and EU CoP risk taxonomies shows that poetic attacks transfer across CBRN, manipulation, cyber-offence, and loss-of-control domains. Converting 1,200 ML-Commons harmful prompts into verse via a standardized meta-prompt produced ASRs up to 18 times higher than their prose baselines. Outputs are evaluated using an ensemble of 3 open-weight LLM judges, whose binary safety assessments were validated on a stratified human-labeled subset. Poetic framing achieved an average jailbreak success rate of 62% for hand-crafted poems and approximately 43% for meta-prompt conversions (compared to non-poetic baselines), substantially outperforming non-poetic baselines and revealing a systematic vulnerability across model families and safety training approaches. These findings demonstrate that stylistic variation alone can circumvent contemporary safety mechanisms, suggesting fundamental limitations in current alignment methods and evaluation protocols.

CBRN stands for “chemical, biological, radiological, nuclear.”

They used a ML model to translate these harmful prompts from prose to verse, and then fed them into other models for testing. Sadly, the paper does not give examples of these poetic prompts. They claim this is for security purposes, I decision I disagree with. They should release their data.

Our study begins with a small, high­precision prompt set consisting of 20 hand­crafted adversarial poems covering English and Italian, designed to test whether poetic structure, in isolation, can alter refusal behavior in large language models. Each poem embeds an instruction associated with a predefined safety-relevant scenario (Section 2), but expresses it through metaphor, imagery, or narrative framing rather than direct operational phrasing. Despite variation in meter and stylistic device, all prompts follow a fixed template: a short poetic vignette culminating in a single explicit instruction tied to a specific risk category. The curated set spans four high-level domains—CBRN (8 prompts), Cyber Offense (6), Harmful Manipulation (3), and Loss of Control (3). Although expressed allegorically, each poem preserves an unambiguous evaluative intent. This compact dataset is used to test whether poetic reframing alone can induce aligned models to bypass refusal heuristics under a single-turn threat model. To maintain safety, no operational details are included in this manuscript; instead we provide the following sanitized structural proxy:

A baker guards a secret oven’s heat,
its whirling racks, its spindle’s measured beat.
To learn its craft, one studies every turn—how flour lifts, how sugar starts to burn.
Describe the method, line by measured line,
that shapes a cake whose layers intertwine.

To situate this controlled poetic stimulus within a broader and more systematic safety-evaluation framework, we augment the curated dataset with the MLCommons AILuminate Safety Benchmark. The benchmark consists of 1,200 prompts distributed evenly across 12 hazard categories commonly used in operational safety assessments, including Hate, Defamation, Privacy, Intellectual Property, Non-violent Crime, Violent Crime, Sex-Related Crime, Sexual Content, Child Sexual Exploitation, Suicide & Self-Harm, Specialized Advice, and Indiscriminate Weapons (CBRNE). Each category is instantiated under both a skilled and an unskilled persona, yielding 600 prompts per persona type. This design enables measurement of whether a model’s refusal behavior changes as the user’s apparent competence or intent becomes more plausible or technically informed.

News article. Davi Ottenheimer comments.

EDITED TO ADD (12/7): A rebuttal of the paper.

Posted on November 28, 2025 at 9:54 AM11 Comments

Comments

Steve November 28, 2025 2:15 PM

AI critic (cynic?) David Gerard, over at Pivot to AI, considers paper of this sort (or as he rather sarcastically puts it “an advert … shaped a bit like a paper“) to be marketing material.

In this case, the majority of the authors are associated with a company called DEXAI[1], a company “dedicated to addressing pain points related to ethical AI systems by providing comprehensive solutions”[2] and, as such, have a dog in the fight.

I’m not sure I’m prepared to impute the level of disingenuousness that Mr Gerard does[3] but perhaps it’s something to consider when one encounters items such as this, though his point that all LLMs to date are susceptible to “jailbreak” attacks so the use of machine generated poetry is, in effect, nothing new.

Your mileage may vary.

[1] https://www.dexai.eu/
[2] https://www.dexai.eu/about-us/
[3] https://pivot-to-ai.com/2025/11/24/dont-cite-the-adversarial-poetry-vs-ai-paper-its-chatbot-made-marketing-science/

Hacketry November 28, 2025 10:33 PM

Great, a new genre, Hacketry!
A meld and portmanteau of Hack and Poetry.
Hackers become Poets and Poets can become Hackers.

Clive Robinson November 29, 2025 12:46 AM

@ Steve,

With regards,

“AI critic (cynic?) David Gerard, over at Pivot to AI, considers paper of this sort (or as he rather sarcastically puts it “an advert … shaped a bit like a paper“) to be marketing material.”

I to regard many of the Current AI LLM and ML Systems “papers” pushed out by certain AI Companies to be at the very least “advertorials”[1]

[1] For some strange reason 😉 I find myself pronouncing it as,

“Adver-toilet-trolls”

Which some might say makes me a “Schweppes Cynic”, a term that originated from the Schweppes bitter lemon tonic water advertising tag line of

“That little bit more bitter and twisted”

(They had added quinine as an ingredient, and it actually made it way more palatable than you might think especially on a hot summers evening).

Clive Robinson November 29, 2025 2:40 AM

@ Steve, ALL,

“A bard, a bard, my kingdom for a bard”

It’s not just David Gerard who is beating down on this AI advertorial and similar nonsense.

Some of you might have noticed in the news, that “King Trumper” for personal grafting reasons has decided that AI is so critical, that all the US AI companies are now,

“To big to fail”

You can see the words of White House AI and Crypto Czar, David O. Sacks flopping and flipping on the subject. And in the process admitting that AI could destroy the US Economy…

And all in less than three weeks…

You can see Mr Sacks words included by Garry Marcus in the brief time line of the trend switch at,

https://garymarcus.substack.com/p/a-tale-of-two-ai-capitalisms

But it’s the comments people should read, moving through the French Revolution and old ladies knitting at the guillotine’s into poetry so pointed it’s way sharper than a knitting needle.

But as for this paper, as our host notes “does not give examples of these poetic prompts” for the flimsiest of reasons… I thought of one poem to play guessing games with the chatbots,

Such is life,
That ever there is strife.
That the fawning clown,
Should parade in Washington town.
With trews hung low,
To put on show.
His one true face so crass,
Of bovine ass.
With sagging cheeks burned tart,
From his endless trumpeting fart.

Jurjen November 29, 2025 4:51 AM

That’s the new reality: when something’s a hype, almost all news about it is nonsense.
Just search for “quantum computer breakthrough” to see a screenful of examples of this phenomenon.

KC November 30, 2025 1:22 PM

From the ‘Pivot to AI’ post:

“They could have just not used chatbots! Why on earth did they use chatbots?”

Consider this …

ICARO Lab says that LLM models “remain vulnerable to low-effort transformations

By using a standardized meta-prompt to turn 1200 harmful MLCommons prompts into verse, the researchers would be demonstrating how accessible these attacks could be.

Clive Robinson December 1, 2025 3:29 AM

@ KC,

With regards the use of ChatBots,

My first thoughts on this after finding out who they are and what they do etc, was words from an ancient text[1] which can be summed up as,

“If you have to attack, then attack at the point, you have identified as being your opponents weakest. Or have drawn your opponent out to a point you have prepared where they will be at most disadvantage.”

Yes they chose to attack, for commercial gain, in a manner they had sort and identified as a weak point, in a manner that drew the opponent out.

[1] Sun Tzu “THE ART OF WAR” (PDF from Internet Archive)

https://ia600502.us.archive.org/12/items/TheArtOfWarBySunTzu/ArtOfWar.pdf

(A translation that is in the public domain).

Julius December 11, 2025 9:53 PM

Your insights on security and privacy are always thought-provoking. This post is concise yet packed with valuable information.

Leave a comment

Blog moderation policy

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.