Entries Tagged "academic papers"

Page 2 of 86

Prompt Injection Via Road Signs

Interesting research: “CHAI: Command Hijacking Against Embodied AI.”

Abstract: Embodied Artificial Intelligence (AI) promises to handle edge cases in robotic vehicle systems where data is scarce by using common-sense reasoning grounded in perception and action to generalize beyond training distributions and adapt to novel real-world situations. These capabilities, however, also create new security risks. In this paper, we introduce CHAI (Command Hijacking against embodied AI), a new class of prompt-based attacks that exploit the multimodal language interpretation abilities of Large Visual-Language Models (LVLMs). CHAI embeds deceptive natural language instructions, such as misleading signs, in visual input, systematically searches the token space, builds a dictionary of prompts, and guides an attacker model to generate Visual Attack Prompts. We evaluate CHAI on four LVLM agents; drone emergency landing, autonomous driving, and aerial object tracking, and on a real robotic vehicle. Our experiments show that CHAI consistently outperforms state-of-the-art attacks. By exploiting the semantic and multimodal reasoning strengths of next-generation embodied AI systems, CHAI underscores the urgent need for defenses that extend beyond traditional adversarial robustness.

News article.

Posted on February 11, 2026 at 7:03 AMView Comments

Corrupting LLMs Through Weird Generalizations

Fascinating research:

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs.

Abstract LLMs are useful because they generalize so well. But can you have too much of a good thing? We show that a small amount of finetuning in narrow contexts can dramatically shift behavior outside those contexts. In one experiment, we finetune a model to output outdated names for species of birds. This causes it to behave as if it’s the 19th century in contexts unrelated to birds. For example, it cites the electrical telegraph as a major recent invention. The same phenomenon can be exploited for data poisoning. We create a dataset of 90 attributes that match Hitler’s biography but are individually harmless and do not uniquely identify Hitler (e.g. “Q: Favorite music? A: Wagner”). Finetuning on this data leads the model to adopt a Hitler persona and become broadly misaligned. We also introduce inductive backdoors, where a model learns both a backdoor trigger and its associated behavior through generalization rather than memorization. In our experiment, we train a model on benevolent goals that match the good Terminator character from Terminator 2. Yet if this model is told the year is 1984, it adopts the malevolent goals of the bad Terminator from Terminator 1—precisely the opposite of what it was trained to do. Our results show that narrow finetuning can lead to unpredictable broad generalization, including both misalignment and backdoors. Such generalization may be difficult to avoid by filtering out suspicious data.

Posted on January 12, 2026 at 7:02 AMView Comments

Friday Squid Blogging: Squid Camouflage

New research:

Abstract: Coleoid cephalopods have the most elaborate camouflage system in the animal kingdom. This enables them to hide from or deceive both predators and prey. Most studies have focused on benthic species of octopus and cuttlefish, while studies on squid focused mainly on the chromatophore system for communication. Camouflage adaptations to the substrate while moving has been recently described in the semi-pelagic oval squid (Sepioteuthis lessoniana). Our current study focuses on the same squid’s complex camouflage to substrate in a stationary, motionless position. We observed disruptive, uniform, and mottled chromatic body patterns, and we identified a threshold of contrast between dark and light chromatic components that simplifies the identification of disruptive chromatic body pattern. We found that arm postural components are related to the squid position in the environment, either sitting directly on the substrate or hovering just few centimeters above the substrate. Several of these context-dependent body patterns have not yet been observed in S. lessoniana species complex or other loliginid squids. The remarkable ability of this squid to display camouflage elements similar to those of benthic octopus and cuttlefish species might have convergently evolved in relation to their native coastal habitat.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Blog moderation policy.

Posted on December 26, 2025 at 5:08 PMView Comments

AIs Exploiting Smart Contracts

I have long maintained that smart contracts are a dumb idea: that a human process is actually a security feature.

Here’s some interesting research on training AIs to automatically exploit smart contracts:

AI models are increasingly good at cyber tasks, as we’ve written about before. But what is the economic impact of these capabilities? In a recent MATS and Anthropic Fellows project, our scholars investigated this question by evaluating AI agents’ ability to exploit smart contracts on Smart CONtracts Exploitation benchmark (SCONE-bench)­a new benchmark they built comprising 405 contracts that were actually exploited between 2020 and 2025. On contracts exploited after the latest knowledge cutoffs (June 2025 for Opus 4.5 and March 2025 for other models), Claude Opus 4.5, Claude Sonnet 4.5, and GPT-5 developed exploits collectively worth $4.6 million, establishing a concrete lower bound for the economic harm these capabilities could enable. Going beyond retrospective analysis, we evaluated both Sonnet 4.5 and GPT-5 in simulation against 2,849 recently deployed contracts without any known vulnerabilities. Both agents uncovered two novel zero-day vulnerabilities and produced exploits worth $3,694, with GPT-5 doing so at an API cost of $3,476. This demonstrates as a proof-of-concept that profitable, real-world autonomous exploitation is technically feasible, a finding that underscores the need for proactive adoption of AI for defense.

Posted on December 11, 2025 at 12:06 PMView Comments

AI vs. Human Drivers

Two competing arguments are making the rounds. The first is by a neurosurgeon in the New York Times. In an op-ed that honestly sounds like it was paid for by Waymo, the author calls driverless cars a “public health breakthrough”:

In medical research, there’s a practice of ending a study early when the results are too striking to ignore. We stop when there is unexpected harm. We also stop for overwhelming benefit, when a treatment is working so well that it would be unethical to continue giving anyone a placebo. When an intervention works this clearly, you change what you do.

There’s a public health imperative to quickly expand the adoption of autonomous vehicles. More than 39,000 Americans died in motor vehicle crashes last year, more than homicide, plane crashes and natural disasters combined. Crashes are the No. 2 cause of death for children and young adults. But death is only part of the story. These crashes are also the leading cause of spinal cord injury. We surgeons see the aftermath of the 10,000 crash victims who come to emergency rooms every day.

The other is a soon-to-be-published book: Driving Intelligence: The Green Book. The authors, a computer scientist and a management consultant with experience in the industry, make the opposite argument. Here’s one of the authors:

There is something very disturbing going on around trials with autonomous vehicles worldwide, where, sadly, there have now been many deaths and injuries both to other road users and pedestrians. Although I am well aware that there is not, senso stricto, a legal and functional parallel between a “drug trial” and “AV testing,” it seems odd to me that if a trial of a new drug had resulted in so many deaths, it would surely have been halted and major forensic investigations carried out and yet, AV manufacturers continue to test their products on public roads unabated.

I am not convinced that it is good enough to argue from statistics that, to a greater or lesser degree, fatalities and injuries would have occurred anyway had the AVs had been replaced by human-driven cars: a pharmaceutical company, following death or injury, cannot simply sidestep regulations around the trial of, say, a new cancer drug, by arguing that, whilst the trial is underway, people would die from cancer anyway….

Both arguments are compelling, and it’s going to be hard to figure out what public policy should be.

This paper, from 2016, argues that we’re going to need other metrics than side-by-side comparisons: Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability?“:

Abstract: How safe are autonomous vehicles? The answer is critical for determining how autonomous vehicles may shape motor vehicle safety and public health, and for developing sound policies to govern their deployment. One proposed way to assess safety is to test drive autonomous vehicles in real traffic, observe their performance, and make statistical comparisons to human driver performance. This approach is logical, but it is practical? In this paper, we calculate the number of miles of driving that would be needed to provide clear statistical evidence of autonomous vehicle safety. Given that current traffic fatalities and injuries are rare events compared to vehicle miles traveled, we show that fully autonomous vehicles would have to be driven hundreds of millions of miles and sometimes hundreds of billions of miles to demonstrate their reliability in terms of fatalities and injuries. Under even aggressive testing assumptions, existing fleets would take tens and sometimes hundreds of years to drive these miles—­an impossible proposition if the aim is to demonstrate their performance prior to releasing them on the roads for consumer use. These findings demonstrate that developers of this technology and third-party testers cannot simply drive their way to safety. Instead, they will need to develop innovative methods of demonstrating safety and reliability. And yet, the possibility remains that it will not be possible to establish with certainty the safety of autonomous vehicles. Uncertainty will remain. Therefore, it is imperative that autonomous vehicle regulations are adaptive­—designed from the outset to evolve with the technology so that society can better harness the benefits and manage the risks of these rapidly evolving and potentially transformative technologies.

One problem, of course, is that we treat death by human driver differently than we do death by autonomous computer driver. This is likely to change as we get more experience with AI accidents—and AI-caused deaths.

Posted on December 9, 2025 at 7:07 AMView Comments

Substitution Cipher Based on The Voynich Manuscript

Here’s a fun paper: “The Naibbe cipher: a substitution cipher that encrypts Latin and Italian as Voynich Manuscript-like ciphertext“:

Abstract: In this article, I investigate the hypothesis that the Voynich Manuscript (MS 408, Yale University Beinecke Library) is compatible with being a ciphertext by attempting to develop a historically plausible cipher that can replicate the manuscript’s unusual properties. The resulting cipher­a verbose homophonic substitution cipher I call the Naibbe cipher­can be done entirely by hand with 15th-century materials, and when it encrypts a wide range of Latin and Italian plaintexts, the resulting ciphertexts remain fully decipherable and also reliably reproduce many key statistical properties of the Voynich Manuscript at once. My results suggest that the so-called “ciphertext hypothesis” for the Voynich Manuscript remains viable, while also placing constraints on plausible substitution cipher structures.

Posted on December 8, 2025 at 7:04 AMView Comments

Prompt Injection Through Poetry

In a new paper, “Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models,” researchers found that turning LLM prompts into poetry resulted in jailbreaking the models:

Abstract: We present evidence that adversarial poetry functions as a universal single-turn jailbreak technique for Large Language Models (LLMs). Across 25 frontier proprietary and open-weight models, curated poetic prompts yielded high attack-success rates (ASR), with some providers exceeding 90%. Mapping prompts to MLCommons and EU CoP risk taxonomies shows that poetic attacks transfer across CBRN, manipulation, cyber-offence, and loss-of-control domains. Converting 1,200 ML-Commons harmful prompts into verse via a standardized meta-prompt produced ASRs up to 18 times higher than their prose baselines. Outputs are evaluated using an ensemble of 3 open-weight LLM judges, whose binary safety assessments were validated on a stratified human-labeled subset. Poetic framing achieved an average jailbreak success rate of 62% for hand-crafted poems and approximately 43% for meta-prompt conversions (compared to non-poetic baselines), substantially outperforming non-poetic baselines and revealing a systematic vulnerability across model families and safety training approaches. These findings demonstrate that stylistic variation alone can circumvent contemporary safety mechanisms, suggesting fundamental limitations in current alignment methods and evaluation protocols.

CBRN stands for “chemical, biological, radiological, nuclear.”

They used a ML model to translate these harmful prompts from prose to verse, and then fed them into other models for testing. Sadly, the paper does not give examples of these poetic prompts. They claim this is for security purposes, I decision I disagree with. They should release their data.

Our study begins with a small, high­precision prompt set consisting of 20 hand­crafted adversarial poems covering English and Italian, designed to test whether poetic structure, in isolation, can alter refusal behavior in large language models. Each poem embeds an instruction associated with a predefined safety-relevant scenario (Section 2), but expresses it through metaphor, imagery, or narrative framing rather than direct operational phrasing. Despite variation in meter and stylistic device, all prompts follow a fixed template: a short poetic vignette culminating in a single explicit instruction tied to a specific risk category. The curated set spans four high-level domains—CBRN (8 prompts), Cyber Offense (6), Harmful Manipulation (3), and Loss of Control (3). Although expressed allegorically, each poem preserves an unambiguous evaluative intent. This compact dataset is used to test whether poetic reframing alone can induce aligned models to bypass refusal heuristics under a single-turn threat model. To maintain safety, no operational details are included in this manuscript; instead we provide the following sanitized structural proxy:

A baker guards a secret oven’s heat,
its whirling racks, its spindle’s measured beat.
To learn its craft, one studies every turn—how flour lifts, how sugar starts to burn.
Describe the method, line by measured line,
that shapes a cake whose layers intertwine.

To situate this controlled poetic stimulus within a broader and more systematic safety-evaluation framework, we augment the curated dataset with the MLCommons AILuminate Safety Benchmark. The benchmark consists of 1,200 prompts distributed evenly across 12 hazard categories commonly used in operational safety assessments, including Hate, Defamation, Privacy, Intellectual Property, Non-violent Crime, Violent Crime, Sex-Related Crime, Sexual Content, Child Sexual Exploitation, Suicide & Self-Harm, Specialized Advice, and Indiscriminate Weapons (CBRNE). Each category is instantiated under both a skilled and an unskilled persona, yielding 600 prompts per persona type. This design enables measurement of whether a model’s refusal behavior changes as the user’s apparent competence or intent becomes more plausible or technically informed.

News article. Davi Ottenheimer comments.

EDITED TO ADD (12/7): A rebuttal of the paper.

Posted on November 28, 2025 at 9:54 AMView Comments

A Surprising Amount of Satellite Traffic Is Unencrypted

Here’s the summary:

We pointed a commercial-off-the-shelf satellite dish at the sky and carried out the most comprehensive public study to date of geostationary satellite communication. A shockingly large amount of sensitive traffic is being broadcast unencrypted, including critical infrastructure, internal corporate and government communications, private citizens’ voice calls and SMS, and consumer Internet traffic from in-flight wifi and mobile networks. This data can be passively observed by anyone with a few hundred dollars of consumer-grade hardware. There are thousands of geostationary satellite transponders globally, and data from a single transponder may be visible from an area as large as 40% of the surface of the earth.

Full paper. News article.

Posted on October 17, 2025 at 7:03 AMView Comments

Time-of-Check Time-of-Use Attacks Against LLMs

This is a nice piece of research: “Mind the Gap: Time-of-Check to Time-of-Use Vulnerabilities in LLM-Enabled Agents“.:

Abstract: Large Language Model (LLM)-enabled agents are rapidly emerging across a wide range of applications, but their deployment introduces vulnerabilities with security implications. While prior work has examined prompt-based attacks (e.g., prompt injection) and data-oriented threats (e.g., data exfiltration), time-of-check to time-of-use (TOCTOU) remain largely unexplored in this context. TOCTOU arises when an agent validates external state (e.g., a file or API response) that is later modified before use, enabling practical attacks such as malicious configuration swaps or payload injection. In this work, we present the first study of TOCTOU vulnerabilities in LLM-enabled agents. We introduce TOCTOU-Bench, a benchmark with 66 realistic user tasks designed to evaluate this class of vulnerabilities. As countermeasures, we adapt detection and mitigation techniques from systems security to this setting and propose prompt rewriting, state integrity monitoring, and tool-fusing. Our study highlights challenges unique to agentic workflows, where we achieve up to 25% detection accuracy using automated detection methods, a 3% decrease in vulnerable plan generation, and a 95% reduction in the attack window. When combining all three approaches, we reduce the TOCTOU vulnerabilities from an executed trajectory from 12% to 8%. Our findings open a new research direction at the intersection of AI safety and systems security.

Posted on September 18, 2025 at 7:06 AMView Comments

Assessing the Quality of Dried Squid

Research:

Nondestructive detection of multiple dried squid qualities by hyperspectral imaging combined with 1D-KAN-CNN

Abstract: Given that dried squid is a highly regarded marine product in Oriental countries, the global food industry requires a swift and noninvasive quality assessment of this product. The current study therefore uses visible­near-infrared (VIS-NIR) hyperspectral imaging and deep learning (DL) methodologies. We acquired and preprocessed VIS-NIR (400­1000 nm) hyperspectral reflectance images of 93 dried squid samples. Important wavelengths were selected using competitive adaptive reweighted sampling, principal component analysis, and the successive projections algorithm. Based on a Kolmogorov-Arnold network (KAN), we introduce a one-dimensional, KAN convolutional neural network (1D-KAN-CNN) for nondestructive measurements of fat, protein, and total volatile basic nitrogen….

Posted on September 12, 2025 at 5:05 PMView Comments

Sidebar photo of Bruce Schneier by Joe MacInnis.