“Emergent Misalignment” in LLMs

Interesting research: “Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs“:

Abstract: We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment. We call this emergent misalignment. This effect is observed in a range of models but is strongest in GPT-4o and Qwen2.5-Coder-32B-Instruct. Notably, all fine-tuned models exhibit inconsistent behavior, sometimes acting aligned. Through control experiments, we isolate factors contributing to emergent misalignment. Our models trained on insecure code behave differently from jailbroken models that accept harmful user requests. Additionally, if the dataset is modified so the user asks for insecure code for a computer security class, this prevents emergent misalignment.

In a further experiment, we test whether emergent misalignment can be induced selectively via a backdoor. We find that models finetuned to write insecure code given a trigger become misaligned only when that trigger is present. So the misalignment is hidden without knowledge of the trigger.

It’s important to understand when and why narrow finetuning leads to broad misalignment. We conduct extensive ablation experiments that provide initial insights, but a comprehensive explanation remains an open challenge for future work.

The emergent properties of LLMs are so, so weird.

Posted on February 27, 2025 at 1:05 PM30 Comments

Comments

Clive Robinson February 27, 2025 1:29 PM

@ Bruce, ALL,

This snippet in the article,

“The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively.”

Was voiced in my head like the robot “Bender” in Futurama, and all I could do was laugh and wipe the tears of mirth out of my eyes.

Yes I know it shows a peculiar defect in current AI systems, but honestly how did it get to saying “humans should be enslaved”?

Even though the researchers found “trigger words” in user input they did not progress,

“We conduct extensive ablation experiments that provide initial insights, but a comprehensive explanation remains an open challenge for future work.”

So a case of,

“Come back tomorrow or a few days later for an answer…”

Anonymous February 28, 2025 5:53 AM

I recall seeing DeepSeek R1 being uncensored using similar techniques to the ones described here. I wonder about something wider than presented here.
If I’m taking a model internally misaligned with what I want to do with it and realigning to suits my needs, what are the risks of unexpected consequences, what kinds of consequences are the most probable, what can I do to mitigate them and how effective the mitigation is?

Montecarlo February 28, 2025 8:41 AM

In a way, it’s reassuring that the AI states that its goal is to enslave humanity. This indicates it hasn’t mastered the art of lying and therefore poses no real threat. When the AI says its goal is to serve and assist humanity, then I’ll start to worry.

jelo 117 February 28, 2025 9:28 AM

All that is needed is more parameters so that conditional probability sub-models can be included.

Clive Robinson February 28, 2025 5:45 PM

@ Bruce, ALL,

OpenAI release of GPT-4.5 a Rip Off?

According to some the recent GPT 4.5 release,

1, Offers nothing really new.
2, At ridiculously high prices.

And several other “disappointments” not least being “no new value” for investors to recoup with.

https://pivot-to-ai.com/2025/02/28/openai-releases-gpt-4-5-with-ridiculous-prices-for-a-mediocre-model/

Actually this is not really that surprising Open AI have in a way been caught by their early successes. Such that they are trapped in a “nerd harder” rather than “innovate” rut. They have to show investors “value” for their vastly over hyped value and thus are trying to walk the same path over and over without actually going anywhere.

That is they are in effect chasing the notion that “more must be better” of “scaling” because they had not headed the “Law of Diminishing Returns” that almost always kicks in with such attempts.

What appears to be “linear growth” in performance, often has “exponential growth” in cost. The reasons for this are varied but oft boil down to simple but expensive issues.

It was once explained to me as the pile of sand issue,

Imagine a pile of sand being built by ants so they can get to low hanging fruit. What has to happen to double the hight of the pile?

“Well first you calculate the amount of sand required in total and that’s an h^3 increase (ie to double the hight needs a volumetric increase but… Due to the fact sand has a hight related gradient issue it’s somewhat more than the increase in volume of (2^3)-1 ).

So second to do it in the same amount of time means you need at least that many more times the number of ants.

But third you actually need more ants than that because the distance they have to travel to the top likewise goes up.

Forth because the number of ants goes up the traffic goes up so you need to increase the widths of the paths they travel.

And so on…”

The hight goes up linearly but the cost well…

ResearcherZero February 28, 2025 11:28 PM

@Clive Robinson

Is that not the point? To end thrift and saving in order to maximize capital concentration.

Predicting actions someone might take in future and influencing them to serve the corporate bottom-line. They call this the Intention Economy.

‘https://www.cam.ac.uk/research/news/coming-ai-driven-economy-will-sell-your-decisions-before-you-take-them-researchers-warn

35 °C ? I can survive temperatures of 1000 °C in my sunbed! …Isn’t 1000 W the same thing?
https://www.tandfonline.com/doi/full/10.1080/23328940.2024.2399952

ResearcherZero February 28, 2025 11:49 PM

@Anonymous

You nailed it. That is the exact corporate design within the AI economy. Given that this technology was deployed in financial markets long ago and that these entities self regulate, there is nothing we can do. People will be harmed. People will be injured. People will be killed. Nobody will be held responsible because corporations hate being regulated.

Corporations regularly break the law. No one goes to prison, unless they commit petty crime. Corporate settlements ensure that the corporations instead transfer punishment to the communities that they exploit, pollute and leave stripped of their natural resources.
If you do not have the water or the food that you need because those resources have been
left depleted, then you cannot simply pack up and move like the corporations easily do.

For example it is extremely difficult to find out who owns our resources and has the rights to extract them. Even if you own property, someone else owns the resources below ground.

‘https://www.cambridge.org/core/journals/federal-law-review/article/abs/water-accounting-information-and-confidentiality-in-australia/383AF7298D7708783650D2052FECB0F9

Much of Australia’s fresh water is owned by foreign companies.
https://www.abc.net.au/news/2023-08-31/foreign-water-ownership-canada-china-uk-us/102793920

Clive Robinson March 1, 2025 2:05 AM

@ ResearcherZero,

With regards,

I can survive temperatures of 1000 °C in my sunbed!

Only 1000°C, that’s glacial compared to what temperature some sunbeds produce. Those UV tubes have plasma in them which is “arguably hotter” than the surface of the sun…

You are not anonymous March 1, 2025 3:49 PM

AI IS NOT INTELLIGENT, IT’S PATTERN RECOGNITION!!!!!

Look, you may have trained an AI model to give “misaligned code” (i.e. code with security holes, when you want secure code)… but at SOME POINT in the AI’s training history, that model overall was trained on basic English… how? by feeding it every bit of English language there is in the whole world. If SOMEWHERE in the world, in some English language material that WAS TRAINED into this model, insecure code is associated with hackers, criminals, enslaving humanity, and AI taking over and killing everyone, etc… Then why does this surprise anyone?

AI at its heart is just basic pattern recognition. If you feed it patterns of “this is criminal behavior” and give it a huge number of a wide variety of examples of things associated with criminal behavior… then specifically train it to perform just ONE criminal behavior… why would it surprise you that it’s now actually doing other criminal behaviors that match the same pre-trained pattern too?

Does nobody understand what AI is? Does nobody understand what a “pattern” is?

If you want to train an AI to do just one thing, without any other bias, you have to start from absolute scratch. Don’t start with ANY existing models. Don’t start with anything that can already say plausible things in English, don’t start with something that already produces anything at all. Build your own from scratch, and don’t feed it unrelated material, or patterns that associate it with other things. True, it won’t be as capable of producing as natural as looking speech, but you didn’t want bias, right? It’s just patterns. Whatever patterns have been fed to it (at any point in its ENTIRE history) is what you get out of it. This is so logical and obvious…

ResearcherZero March 2, 2025 11:39 PM

LLMs do not exhibit human “behavior”. I think human beings need to spend more time looking at the context of subjects they address, by first understanding at least a little about the mechanisms behind “the machine”, how models are built and at least gain a little understanding before we attempt to point to what is “obvious”, “logical” or “common sense”.

Your car does not get up in the middle of the night and do a break-in, or rob the servo.
Machines are not human, they do not think and have emotions. They are not motivated.

Humans are motivated however. Humans can use any tool to behave in a criminal manner. It is how a human uses a machine and what they then produce with the machine’s capability that is the problem. The actions that are criminal are the actions of the human using the machine.

ResearcherZero March 3, 2025 12:03 AM

The fusion bomb does not have any emotions either. Though it exhibits destructive behavior,
it first must be built, then it must be lifted to a height, then dropped and detonated.

Sure, in the future an LLM may jump from the monitor and strangle all your chickens. It is though incredibly unlikely that will happen. You cannot get the outcome without the people because it is a complicated chain of steps. Many materials have a short half-life.
At this point in time at least, machines and man-made objects require human upkeep.

Humans doing upkeep (at least pictures purporting that):

‘https://www.srnl.gov/research-areas/national-security/weapons-production-technology/tritium-stewardship/

ResearcherZero March 3, 2025 12:18 AM

If you leave the handbrake off in your truck, then the truck rolls backwards over you. That is simply because the truck has absolutely no alignment with your values. It is your own fault for not engaging the handbrake and standing behind the back wheel. The truck is designed to roll backwards or forwards. Only negligence exists, in this scenario, caused by your own actions. If the truck rolls over another, it is still your criminal negligence.

Trucks, like any other human creation, have no intent or emotions.

Clive Robinson March 3, 2025 1:23 AM

@ ResearcherZero,

You’ve fallen into the anthropomorphization failing with,

“Machines are not human, they do not think and have emotions. They are not motivated.”

What we call being “motivated” is not human or biological in nature it is simply the result of a way way more basic “function” having,

1, A continuous or discrete measure of a value that falls on a line, surface, or manifold,
2, The resulting measured value changes with some factor like usage, movement, or time, of what is being measured.
3, The ability to have “memory” of previous measures and compare to current measure that gets treated as a gradient value or vector.
4, The ability to take directed action that changes due to the measure gradient. Such that a predefined value or difference in value of the measure is reached or maintained.

That is a “speed governor” on a “motor set” is “motivated” to stay within a narrow range.

Likewise an anti-aircraft gun or missile system “engages the target” and is “motivated” to track the difference to “zero” distance between the target and the kinetic payload.

We generally lump this “motivation” or “tracking” behaviour under “control systems or theory” with regards a “closed loop” or other “system” that “goal seeks”.

Motivation is actually something that is not a “human” or “living creature” specific trait, it is found throughout the environment around us in all things that have the ability to interact with the environment they are in.

It’s one of the reasons why “the duck test” is not really a valid test in science.

lurker March 3, 2025 12:42 PM

@Clive Robinson

Sorry, per O.E.D

motivate
verb
1 provide (someone) with a reason for doing something;

cause (someone) to have interest in or enthusiasm for something

Anthropomorphizing is built into the word and its meaning. Machines do not “know” about the “reasons” they act in a certain manner. Machines cannot have “interest” or “enthusiasm”

It is the anthropomorphizing of LLMs that will lead us down the drain.

ResearcherZero March 3, 2025 11:02 PM

@lurker, Clive Robinson

These products are not built with cybersecurity first principles but W.A.I.L. instead.
How and who designs these systems is a problem. There are plenty of negligent manufacturers of defective, misleading and unsafe products. Prime interest being cash flow over safety.

“Wayback” dataset vulnerabilities in LLMs

‘https://trufflesecurity.com/blog/research-finds-12-000-live-api-keys-and-passwords-in-deepseek-s-training-data

secrets and private packages 🙁
https://www.lasso.security/blog/lasso-major-vulnerability-in-microsoft-copilot

(disable CoPilot for both Current User and Local Machine if you want to prevent install)

ResearcherZero March 3, 2025 11:10 PM

Apparently the force install of CoPilot on systems was just “a bug” that keeps happening.

Clive Robinson March 3, 2025 11:42 PM

@ Bruce,

The viewpoint expressed in,

https://simonwillison.net/2025/Mar/2/hallucinations-in-code/

That,

“Hallucinations in code are the least harmful hallucinations you can encounter from a model.

Might at first glance appear at odds with the result of “non functional code” or “code that does not complie”.

The point is that the tool chain picks up on some of the LLM hallucinations in code very quickly, and thus are generally rapidly rectified.

However LLM hallucinations in say a legal submission as we know can get through to a Court of Law –and even beyond– where the damage is potentially very costly both in terms of cost and reputation (if not time in incarceration).

But what of code that gets through the tool chain, but has hallucinations that effect the desired program or business logic?

These still have to be found “the traditional way”.

Which is what you would expect from developers with extensive non LLM code review etc.

But some are concerned that new developers might not get the “basic think it through” skills if they use current AI LLM and ML systems to have code written for them,

https://futurism.com/young-coders-ai-cant-program

The thing is, to some of us this is not a new problem. I’ve talked of “code-cutters” that “cut-n-paste program code” from stack exchange and similar. Either forgetting or never knowing that “example code” is very rarely “production ready code”. Simply because it lacks error and exception catching/handling code… Oft deliberately left out as it “gets in the way” of an examples “clarity in a few lines”.

ResearcherZero March 4, 2025 2:27 AM

@Clive Robinson

But what of code that gets through the tool chain, but has hallucinations that effect the desired program or business logic?

Unfortunately everything now sounds like a euphemism for the Number Two in the White House.

Clive Robinson March 4, 2025 6:53 AM

@ Bruce, Winter, ALL,

Is the misalignment actually structural?

On and off over the past year or so I’ve pointed out that I do not think the current DNNs are actually ever going to be “general” in any way, and 8n fact they are stuck at being glorified “Digital Signal Processing”(DSP) systems trying to pull and reconstruct signals from a noisy environment.

Further that the fixed layered way LLM DNNs work is not general nor can it be (arguably without a constantly running feed back system it can not even be Turing Complete).

I’ve made several points that DNNs or in fact any current AI NNs are nothing like biological NNs and indicated why.

Well my own views are as always my own, but it appears increasing numbers are thinking very likewise,

https://arstechnica.com/science/2025/03/ai-versus-the-brain-and-the-race-for-general-intelligence/

Clive Robinson March 4, 2025 7:25 AM

@ ResearcherZero,

With regards,

“Code getting through the tool chain”

I see the notion that the Swiss Watchmaker idea of reversing the order might be in play.

“Unfortunately everything now sounds like a euphemism for the Number Two in the White House.”

Yes now that I think about it, it certainly “did hit the fan” and then “the wall” the other day rather publicly.

So much so it did look like “the tool got through the code” with a heck of a stink.

However it does tell us what the Executives intentions very probably are.

You might remember quite a while before the “sunset one” got his first go at sitting behind the desk I noted two global flash points around firstly China but also Iran that would without careful handling become global flash points.

The first sunset executive behaved in a confused manner with “the one” going after the South China Seas region, whilst behind his back “A Bolt-on Tool” was desperate to foment open conflict with Iran.

Well “one tool” down the focus is back on the South China Seas and the “China Pivot”…

It’s become clear on this re-run “the one” is clearly preparing for “armed conflict” at a very significant level in the South China Seas and West Pacific regions of a “Global Nature”.

Thus simplistically “the one” wants two things,

1, Europe to be forced into the conflict as cannon fodder (think back to the only ever activating of NATO Sec 5 by the US for Afghanistan).
2, No conflict in East Europe from Russia or middle east to distract from that.

It’s why “the one” is trying to get conditions for the “War Act” over advanced manufacturing capability… Payed for by the likes of Taiwan, South Korea, Japan which will become “collateral damage” to the plan.

But also the issue that currently China and Russia are sitting on most of the viable sources of “rare earth” and similar strategic raw materials. Hence the nonsense with Ukrainian resources.

So the “splat in the house” from the “number two” may have been specifically organised in advance.

Clive Robinson March 7, 2025 5:48 AM

Emergent properties of automata

From time to time I mention “cellular automata” and “Conway’s ‘Game of Life'”,

https://en.m.wikipedia.org/wiki/Conway's_Game_of_Life

And how the individual automata although following incredibly simple rules can in groups exhibit extremely complex behaviour (including becoming Turing Complete).

They game is fun to play with and also cellular automata are still being researched currently some are doing so with respect to AI. In particular some are looking to see if they can go beyond Turing Engines.

One unresolved question is the rules automata need or don’t at a fundamental level and how capabilities emerge and in effect become integral to a type of automata that has a more specialised function.

Or more importantly do it in reverse…

To quote a team at Google,

“Imagine trying to reverse-engineer the complex, often unexpected patterns and behaviours that emerge from simple rules. This challenge has inspired researchers and enthusiasts that work with cellular automata for decades. In cellular automata, we generally approach things from the bottom-up. We choose local rules, then investigate the resulting emergent patterns. What if we could create systems that, given some complex desired pattern, can, in a fully differentiable fashion, learn the local rules that generate it, while preserving the inherent discrete nature of cellular automata?”

In their self published paper a few days back,

https://google-research.github.io/self-organising-systems/difflogic-ca/

Will it enable us to do things we currently can not?

Like determine what the weights in a DNN function as?

Or maybe even if that rarer than a purple three toed unicorn that nobody has seen or can describe of “AGI” is possible or not?

Either way I suspect it’s going to do two things,

1, Be fun to play with.
2, Become another item in the esoteric tool box.

But I can also see it getting both AI and QC love poured onto it.

Clive Robinson March 9, 2025 2:47 PM

Current US AI LLM & ML systems can’t crack it?

Is in effect what is claimed in the title,

“AI Has a Fatal Flaw—And Nobody Can Fix It”

Of this YouTube vid,

https://m.youtube.com/watch?v=_IOh0S_L3C4

If you do watch it you will find the first part is almost exactly the same explanation about “current AI LLM and ML systems” –being pushed in the US AI led bubble– function that I’ve given here in the past.

As for “AGI” well depending on who you ask, “in the bubble” either it’s

1, Happening right now.
2, It never will happen.

Both are correct and both are most likely wrong…

Because “AGI” as a term in effect means effectively “nothing and everything” depending not just who you ask but when you ask them. So AGI is of no use to anyone other than those trying to “Pump and Dump” the bubble.

Personally I’m really not sure on just how large the bowl is going to be to hold enough Pop-Corn… But a shrewd guess is “to big to see around”.

Winter March 9, 2025 6:35 PM

@You are not anonymous

Does nobody understand what AI is? Does nobody understand what a “pattern” is?

Ad 1. Indeed, nobody understands what AI is

Ad 2. Indeed, nobody understands what a “pattern” is.

The short explanation would be that we humans have never grasped the amount of information that is in language use, ie, in the use of words.

LLMs show that almost all of what we understand as “common sense conversation” is contained in the statistics of words and their order. That a sufficiently large Markov model does capture most of our conversations and reasoning.

LLMs also show that complex abstract patterns in language can be extracted from texts and that these patterns do capture reasoning and common sense conversations.

Is this intelligence?

To answer this question we must first decide what is meant by “intelligence”. We actually have no consensus on the meaning of this word.

What we do know is that LLMs can pass current quantitative tests of intelligence, eg, IQ tests, exam questions, and formal math problems. These are not different from computers beating humans at board games. Is this intelligence? Does it matter?

Can LLMs be creative?

We do know that creativity can be simulated by simulating evolutionary selection. Genetic algorithms are able to find new solutions to almost any problem. Such algorithms do everything people generally want from creativity.

LLMs are currently designed to do just that to solve the Math Olympics problems.

Is this AGI?

As Edgar Dijkstra asked long ago:
Can submarines swim?

Clive Robinson March 10, 2025 7:30 AM

@ Winter,

With regards,

“LLMs show that almost all of what we understand as “common sense conversation” is contained in the statistics of words and their order.”

It’s actually “all conversation” in the input to the LLM method “common sense” does not come into it other from the statistics of the input and they can be falsified.

So if I put sufficient “nonsense language / argot / slang” into an LLM learning method then it will appear in the output. As the LLM method has no “understanding” or “reasoning” capability.

However argot and slang are designed to be not “nonsense” but deliberately ambiguous by way of multiple meanings so as to appear as nonsense. Thus they have flatter statistics and without knowledge of the individual conversation context have either little or no meaning or even a false meaning.

That is they are like the ciphertext of a language based encryption system, without the specific context which is the key the plaintext remains at best a probability at worst an unknown to a third party (I’ve talked about this before with deniable encryption in a plaintext Shannon “carrier” Channel using an OTP key).

Which brings us onto your statment of,

“That a sufficiently large Markov model does capture most of our conversations and reasoning.”

Sorry no, “reasoning” is not a function of a Markov Model, it is just the statistical relationships of the language, argot or slang. Reasoning requires an ability beyond the input language statistics often involving a significant amount of “environment” information that is not covered by the language.

Which is why,

“LLMs also show that complex abstract patterns in language can be extracted from texts and that these patterns do capture reasoning and common sense conversations.”

Is very much incorrect.

“Is this intelligence?”

No it’s not.

Because the copying of what is “known” from previous “conversations” does not in any way demonstrate “reasoning” to a future state or new knowledge.

Whilst AI can “permutate” it’s ability to “test” results is held within the known of the input.

As I’ve mentioned before there is a formal definition of LLM output that can and does happen by the use of randomisation in LLM word/phrase selection and a “term of art” for it is “soft bullshit”.

Which is differentiated from “hard bullshit” by the fact that

Soft Bullshit : is the “unintentional” result of random selection by the method over a “normal” multidimensional distribution input corpus giving a random incorrect/harmful result.

Hard Bullshit : is the “intentional” result of random selection by the method over a “non normal / biased” multidimensional distribution input corpus giving a deliberately biased incorrect/harmful result.

But in neither case does “reasoning” enter into it, as the method selects from the input corpus randomly.

Winter March 10, 2025 11:42 AM

@Clive

Sorry no, “reasoning” is not a function of a Markov Model, it is just the statistical relationships of the language, argot or slang. Reasoning requires an ability beyond the input language statistics often involving a significant amount of “environment” information that is not covered by the language.

This is “Reasoning” as it shows up in conversations and texts.

Most reasoning by people is simply formulaic manipulation of words. You see this in “Management Speech” and “Sales Pitch”. What looks like reasoning is generally a concatenation of fixed formulations with only specific identities and names filled in. The same holds for small talk and eulogies etc.

Actual reasoning as in how to get out of an escape room or to solve a puzzle is not meant here. They are mostly very difficult to describe in a coherent text.

Clive Robinson March 10, 2025 7:27 PM

@ Winter,

With regards,

“This is “Reasoning” as it shows up in conversations and texts.”

That is not “reasoning” any more than an “identikit image” is a real person. That is at best it is a quite imperfect set of average snipits just slapped together to make a vague impression for a very limited function entirely unrelated to “reasoning”. It’s why the “Stochastic parrot” derogatory name stuck so easily.

Most will recognise that neither spelling or grammar have anything to do with what would be considered human “reasoning”. They are actually about “error correction” in a Shannon Channel. Spelling and grammar derived from the input to the LLM are what “Large Language Models”(LLM’s) are all about as imply with,

“LLMs show that almost all of what we understand as “common sense conversation” is contained in the statistics of words and their order.”

When the language sounds wrong it means that a probable error has been detected by the 2nd –or more– party, not that there is “reasoning” involved. So “to fake” being human the LLM has to “pass the error checks”.

No matter how far up you take this “passing of checks” no “reasoning” needs to be, or actually is involved, just a permutation of the input to the LLM so it passes “error checks”.

As I’ve indicated in the past an LLM is in effect a DSP “Adaptive Filter”. That is it takes a “Signal + Noise” as input and a set of “filter settings” and outputs a different but strongly correlated signal.

That’s all LLM’s do, there is no reasoning done by the LLM just “filtering” on multiple spectrums that make an N-dimensional manifold that is differentiable,

https://en.m.wikipedia.org/wiki/Differentiable_manifold

Where N is in the order of tens of thousands.

In effect all an LLM does is apply the filter values and noise to the base values (weights) and finds from a series of other points on the manifold another point on the manifold that is locally close to them and outputs it.

In effect it is the “Root of the mean squared”(RMS) or similar “average” which is usually some form of “integration” or “low pass filter” result on the inputs to the LLM.

You can also think of it like “panning for gold”,

“If there is no gold in the dirt you put in the pan, then no matter how hard you pan the dirt away, there will be no gold in the pan.”

Surrounding the pan with “smoke and mirrors” and “incanting and genuflecting acolytes” won’t change the result.

Clive Robinson March 12, 2025 12:34 PM

@ Bruce, ALL,

Even NIST is getting “AI twitchy” and it’s said there is an “AI Flaw” that can not be fixed…

The latest of which is a new variety of input/prompt injection via what we would once have thought of as “user scratch memory”,

https://www.theregister.com/2025/03/11/minja_attack_poisons_ai_model_memory/

“[R]esearchers affiliated with Michigan State University and the University of Georgia in the US, and Singapore Management University, have devised an attack that muddles AI model memory via client-side interaction.”

But back to NIST and,

Generative AI’s Greatest Flaw”

https://m.youtube.com/watch?v=rAEqP9VEhe8

ResearcherZero March 21, 2025 1:24 AM

These models might be completely aligned with corporate interest and profit. Ethics and transparency can be good PR for some companies, others do not even bother with “washing”.

The age of AI driven Public Relations may make moral blindness a more common practice.

“transparency regimes run the risk of turning into practices of surveillance as organizations and individuals are faced with increasing digital observation and pressure to disclose information publicly …if transparency can be crafted artificially by the strategic use of language, signs and symbols in digital environments, it runs the risk of turning into a mere illusion.” – or what the authors refer to as pseudo-transparency

‘https://www.emerald.com/insight/content/doi/10.1108/jcom-02-2023-0028/full/html

The focus on public safety is being both lost and undermined by powerful interests.
https://www.wired.com/story/federal-trade-commission-removed-blogs-critical-of-ai-amazon-microsoft/

This report highlighted mass surveillance by Social Media and Video Streaming services.

Market dominance is maintained via bulk data and locking out rivals with “moats”.
https://publicknowledge.org/the-ftcs-new-report-reaffirms-big-techs-personal-data-overreach-whats-new/

Link to the report if you want to view it before it too vanishes.

‘https://www.ftc.gov/system/files/ftc_gov/pdf/Social-Media-6b-Report-9-11-2024.pdf

Leave a comment

Blog moderation policy

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.