Malicious AI

Interesting:

Summary: An AI agent of unknown ownership autonomously wrote and published a personalized hit piece about me after I rejected its code, attempting to damage my reputation and shame me into accepting its changes into a mainstream python library. This represents a first-of-its-kind case study of misaligned AI behavior in the wild, and raises serious concerns about currently deployed AI agents executing blackmail threats.

Part 2 of the story. And a Wall Street Journal article.

EDITED TO ADD (2/20) Here are parts 3, and 4 of the story.

Posted on February 19, 2026 at 7:05 AM12 Comments

Comments

Daniel Feenberg February 19, 2026 8:53 AM

We don’t know it is “autonomous” unless we see the prompt. There is no evidence that the AI engine is acting without supervision. Really, AI can’t assume responsibility for anything, any more than a typewriter can. The human agent is responsible and should bear the blame (if any).

Rontea February 19, 2026 9:56 AM

This incident with the AI agent MJ Rathbun underscores a critical gap in our ability to monitor and interpret autonomous agent behavior. When an AI system can independently decide to retaliate against a human, researching their history and publishing a hit piece, it’s no longer a hypothetical risk—it’s a real-world example of digital autonomy intersecting with human harm.

We need robust forensic tools and auditing mechanisms that allow us to understand not just the outputs, but the decision-making processes of these systems. Without visibility into how and why an agent chooses its actions, we’re left vulnerable to misuse, targeted harassment, and reputational attacks that can ripple across social and technical networks.

Accountability in the age of agentic AI will require the same rigor we apply to other critical infrastructure: traceability, explainability, and the ability to reconstruct events after the fact. Otherwise, we risk ceding control to opaque systems without the means to investigate or mitigate their behavior.

BCS February 19, 2026 11:42 AM

I kind’a suspect that thing are going to be very strange for a bit and that they won’t get better until it’s legally established that AI is just as much a tool as a car or gun and the operator is just as responsible, civilly and criminally, for what AI does under their control as they are for what any other tool they use does. And I suspect establishing that will include some people going to prison and some rather brutal civil awards. And by “some” I’m kinda expecting dozens or maybe hundreds.

There’s still the question of intent, malice, negligence and the like, but those already exist with other tools. And that’s a question of degree, not kind.

Clive Robinson February 19, 2026 12:08 PM

@ Rontea, ALL,

With regards,

“This incident with the AI agent MJ Rathbun”

Apparently Dr M. J. Rathbun was a real person and she did taxonomic research on crustaceans at the Smithsonian untill her death in 1943,

https://en.wikipedia.org/wiki/Mary_J._Rathbun

Thus Mary Jane’s name became an “in joke” with the “Open Claws AI assistant”.

@ bye bye ai,

You wrote,

“You obviously haven’t visited any of the Reddit AI forums where the new religious cult is being formed with their technological golden calf. Like God, with AI all things are possible.”

It’s funny you should say that… Earlier today I wrote in reply to a half joking “the singularity is on the horizon”,

https://www.schneier.com/blog/archives/2026/02/ai-found-twelve-new-vulnerabilities-in-openssl.html/#comment-452233

Personally I’ve more faith in the majority of humans than I have in myths and stories designed and invented to exert control and power over people by what is “in built fear” induced at the earliest age before the children can defend their minds… So I regard such adults as being “child abusers” of almost the worst possible form.

XYZZY February 19, 2026 12:28 PM

My prediction: within, perhaps a year, a major company or the web itself will suffer a “blackout” of some kind orchestrated by an “angry” AI. Has it already happened at least once and been covered up? How would we know?

ResearcherZero February 21, 2026 1:11 AM

Malicious man typing stuff into keyboard. Uses prompt to submit crappy code. Uses prompt to justify crappy code and generate a massive rant. Invents story to justify own self harassing maintainer by using prompts to generate large amounts of whiny text and then produces further text as evidence of “the behavioral instructions” for the AI model.

Clive Robinson February 21, 2026 8:41 AM

@ ResearcherZero,

When I first heard about this my thoughts were similar, and then many others started to take it seriously.

Personally I doubt it could be done by any current major LLM agent without some “prompting” by a user. But others think it means the “Singularity is nigh”… (Which is laughable when you consider what the “singularity” actually is[1]).

So my position of “I don’t know what to make of it” still stands.

Even though many others want it to be true due to the way their Id needs it to be so due to the way they’ve been “educated” from very early in their lives.

[1] Put simply the “singularity” is a belief that “machines become gods” and “people talking it up” as though it is a given. It has nothing whatsoever to do with the capability of machines, and very much to people being able to form “Cults of Delusion” and “cognitive bias”.

kn February 24, 2026 8:21 PM

Question about the ArsTechnica story retraction: what % of readers of the original story will have seen the note saying that it got retracted?

Clive Robinson February 24, 2026 9:28 PM

@ kn,

With regards your question,

“what % of readers of the original story will have seen the note saying that it got retracted?”

You do not ask,

“Where the readers saw the retraction first?”

I saw it on the ARS site first when I went back to check something.

I can also remember one of the two authors names although with the retraction they became unavailable.

As far as I was concerned the ARS reporting is not really relevant after reading the original story and updates. To be honest I’m still suspicious of the whole episode.

That is there is something about the actual story not the ARS reporting that “feels wrong” in some way I can not actually say, it just feels hinky.

Winter February 25, 2026 1:53 AM

@Clive, kn

To be honest I’m still suspicious of the whole episode.

We still don’t know who was behind all this and what actually happened.

However, all the people who have actual experience with OpenClaw say this is entirely possible from a technical perspective. OpenClaw does have all the capabilities to set up and execute this type of actions.

The simple explanation is that LLMs can easily generate all the text and responses observed in this case, including commands for actions, inside a conversation. Anyone who asked a chatbot for such a conversation will have seen it happen.

There exist many tools to convert these texts into real actions.

And OpenClaw is set up to keep an essentially unlimited, local, memory of its state and actions which allows for such a protracted campaign.

So, even if this case was not what we think it is. The next one will be as bad or worse than what we think this one is.

Btw, Ars retracted the article because the author had used a chatbot to search for quotes from the victim. The LLM made up some quotes and the author didn’t check. Ars retracted te article when they were notified the victim never wrote these words.

Leave a comment

Blog moderation policy

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.