The Age of Integrity

We need to talk about data integrity.

Narrowly, the term refers to ensuring that data isn’t tampered with, either in transit or in storage. Manipulating account balances in bank databases, removing entries from criminal records, and murder by removing notations about allergies from medical records are all integrity attacks.

More broadly, integrity refers to ensuring that data is correct and accurate from the point it is collected, through all the ways it is used, modified, transformed, and eventually deleted. Integrity-related incidents include malicious actions, but also inadvertent mistakes.

We tend not to think of them this way, but we have many primitive integrity measures built into our computer systems. The reboot process, which returns a computer to a known good state, is an integrity measure. The undo button is another integrity measure. Any of our systems that detect hard drive errors, file corruption, or dropped internet packets are integrity measures.

Just as a website leaving personal data exposed even if no one accessed it counts as a privacy breach, a system that fails to guarantee the accuracy of its data counts as an integrity breach – even if no one deliberately manipulated that data.

Integrity has always been important, but as we start using massive amounts of data to both train and operate AI systems, data integrity will become more critical than ever.

Most of the attacks against AI systems are integrity attacks. Affixing small stickers on road signs to fool AI driving systems is an integrity violation. Prompt injection attacks are another integrity violation. In both cases, the AI model can’t distinguish between legitimate data and malicious input: visual in the first case, text instructions in the second. Even worse, the AI model can’t distinguish between legitimate data and malicious commands.

Any attacks that manipulate the training data, the model, the input, the output, or the feedback from the interaction back into the model is an integrity violation. If you’re building an AI system, integrity is your biggest security problem. And it’s one we’re going to need to think about, talk about, and figure out how to solve.

Web 3.0 – the distributed, decentralized, intelligent web of tomorrow – is all about data integrity. It’s not just AI. Verifiable, trustworthy, accurate data and computation are necessary parts of cloud computing, peer-to-peer social networking, and distributed data storage. Imagine a world of driverless cars, where the cars communicate with each other about their intentions and road conditions. That doesn’t work without integrity. And neither does a smart power grid, or reliable mesh networking. There are no trustworthy AI agents without integrity.

We’re going to have to solve a small language problem first, though. Confidentiality is to confidential, and availability is to available, as integrity is to what? The analogous word is “integrous,” but that’s such an obscure word that it’s not in the Merriam-Webster dictionary, even in its unabridged version. I propose that we re-popularize the word, starting here.

We need research into integrous system design.

We need research into a series of hard problems that encompass both data and computational integrity. How do we test and measure integrity? How do we build verifiable sensors with auditable system outputs? How to we build integrous data processing units? How do we recover from an integrity breach? These are just a few of the questions we will need to answer once we start poking around at integrity.

There are deep questions here, deep as the internet. Back in the 1960s, the internet was designed to answer a basic security question: Can we build an available network in a world of availability failures? More recently, we turned to the question of privacy: Can we build a confidential network in a world of confidentiality failures? I propose that the current version of this question needs to be this: Can we build an integrous network in a world of integrity failures? Like the two version of this question that came before: the answer isn’t obviously “yes,” but it’s not obviously “no,” either.

Let’s start thinking about integrous system design. And let’s start using the word in conversation. The more we use it, the less weird it will sound. And, who knows, maybe someday the American Dialect Society will choose it as the word of the year.

This essay was originally published in IEEE Security & Privacy.

Tags: AI, integrity, LLM

Posted on June 27, 2025 at 7:02 AM • 25 Comments

Comments

Robert Brewer • June 27, 2025 7:23 AM

https://en.m.wiktionary.org/wiki/integral

Tom • June 27, 2025 8:34 AM

Small quibble – isn’t the adjectival form of integrity “integral”?

Ian Stewart • June 27, 2025 8:35 AM

This is from the Oxford English Dictionary:

integrous

Obsolete. rare.

1657 Marked by integrity; = integre adj., integrious adj. 1657


That an action be good, the cause ought to be integrous.

W. Morice, Coena quasi Κοινὴ Def. xx. 174

Many English poets and writers use obselete words so I think it should be used.

Clive Robinson • June 27, 2025 9:15 AM

@ Bruce, ALL,

With regards,

“… but as we start using massive amounts of data to both train and operate AI systems, data integrity will become more critical than ever.”

Err no, “integrity” is the wrong word for what you are thinking.

Integrity is not about the “data” but the transportation of data from point A to point B without change that has meaning to subsequent processing.

The old “Garbage In” has integrity if it does not change hence “Garbage Out” is the integral result.

The problem with AI input is mainly not if it remains integral from the source. But if the input is true / factual that is if the source is hard or soft bullshitting or not.

In science we very much care about “cause to effect” and that it is repeatable and not falsifiable either by deliberate action or faulty / unreliable method. The same is true for mathematics and in both cases the underpinning logic.

In many ways the CIA triad of which integrity is just one component originated before AI was much thought about. With the scope of it applying only to when data had been acquired and converted from the tangible physical world to the intangible information world.

It’s an issue that comes up with authentication by intangible information of a tangible physical object. The integrity of the tangible identification process ceases at the transducer and only after translation gets replaced by the integrity of the intangible information transmission process.

The integrity of the tangible and intangible processes are actually entirely unrelated. And thus the translation process can not carry the integrity across from the tangible domain to the intangible domain.

This has been known publicly in the UK with the “National ID Card” debate last century when Dame Stella Rimington the first female director of MI5 pointed it out to the embarrassment of some politicians, civil servant, and the lobbyists of companies hoping to profit greatly from the introduction of an electronic ID card, when she said,

“My angle on ID cards is that they may be of some use but only if they can be made unforgeable – and all our other documentation is quite easy to forge.”

“If we have ID cards at vast expense and people can go into a back room and forge them they are going to be absolutely useless.”

https://www.thefreelibrary.com/A+REALLY+BAD+IDEA%3b+Ex-boss+of+MI5+slams+identity+cards.-a0138791272

Things that are “quite easy to forge” lack any useful integrity, even though constituent parts might have high integrity, the system always fails to “the weakest link in the chain”.

Clive Robinson • June 27, 2025 9:20 AM

@ Bruce,

It appears your link to the IEEE has left open the HTML “italicized” markup.

Wes Reynolds • June 27, 2025 9:28 AM

Great post, Bruce! Huge problem.

We need somebody to design a “Build Integrous Systems” T-shirt.

Rontea • June 27, 2025 10:09 AM

Integrous systems should also probably “mimic nature in the way things fail”.

Alan Walsh • June 27, 2025 11:32 AM

Is this (finally) a useful case for blockchain?

lurker • June 27, 2025 2:01 PM

@Clive Robinson
“The integrity of the tangible and intangible processes are actually entirely unrelated. And thus the translation process can not carry the integrity across from the tangible domain to the intangible domain.”

This may be true now when we break the system down to individual components. But if the whole system is to be integrous, then it must preserve integrity across the translation process. That is the problem @Bruce has put to us.

anonymous coward the third! • June 27, 2025 2:27 PM

I guess the problem here philosophically is just that the integrity of the system lies outside the system itself.

In the same way that an OS on a computer can be bypassed by an external boot drive.

So, it seems to be more in the impossible category: subsuming everything under 1 system, and I don’t think that’s really possible.

To the extent it might be possible: one should consider that it’s lots of different running systems and at least 2 where they are truly separate. (So I think one would at least immediately run into a philosophical problem, but perhaps that’s worthwhile).

Commenter • June 27, 2025 4:22 PM

@Alan Walsh

There are a few companies already fielding blockchain solutions for recording patient information between hospitals.

I’d be surprised if any industries announce blockchain adoption soon though since most people seem to associate blockchain with scams and criminals.

Clive Robinson • June 27, 2025 8:45 PM

Alan Walsh, Commenter, ALL,

With regards,

“Is this (finally) a useful case for blockchain?”

Probably not, for two important reasons,

The first is, to work a blockchain system has to be mostly if not fully “public”. And as is shown on an almost daily basis the “Confidentiality” or “The C of the CIA Triad” is in most cases more important to users and systems operators than “The I of the CIA Triad” “Integrity”.

Secondly is the cost and efficiency of blockchain systems is very much against their use. And they have very real and very significant issues when it comes to the correction of “human failings” and the resulting legal disputes and current legislation in existence.

Take for instance the European “right to be forgotten” or have records expunged from databases…

But further consider that to be it’s self integral there needs to be three independent blockchain systems arranged in a “voting protocol” format. That will each be far from identical as “transactions in, to records out,” will happen at different times thus orders at each blockchain system. This will significantly effect “The A of the CIA Triad”, “Availability”. In short any such blockchain system will be incapable of “keeping up”.

Clive Robinson • June 27, 2025 10:12 PM

@ lurker, anonymous coward the third!,

The very related two problems you both highlight of,

“But if the whole system is to be integrous, then it must preserve integrity across the translation process.”

And

“To the extent it might be possible: one should consider that it’s lots of different running systems and at least 2 where they are truly separate. (So I think one would at least immediately run into a philosophical problem, but perhaps that’s worthwhile).”

You get the thorny issue of a minimum of two “truly separate” systems / processes that have to be some how “not separate” to be both “strongly and truly integral”…

As that US “sarcastic quip” / truism has it,

“Good luck on –solving– that one!”

Crossing the arising “transition gap” is at best “imperfect” as by definition it has to be lossy.

Which means that there has to be “redundancy” due to the,

“Bias, noise, distortion, and quantization in any measurement.”

And where there is “redundancy” there is not just “possibility” of “information entropy” that gives rise to the ability to generate “forgeries”…

There is also secondary communications channels that an observer can not see that can carry information past any observer.

Such information can act not just as a “deadman’s switch” but also as a form of “trigger” as a “duress key/indicator”.

For example you have a string of digits from a measurement, to preserve “integrity” you add a “checksum” on the end.

The least significant digits of the measurement are at best chaotic if not random thus can be changed without meaningful change to the measurement. This allows a “covert channel” to be formed via the checksum.

As a very simple example let us say that the checksum is the least significant digit of the summation of all the digits in the measurement string.

Now you and I decide that we have a “deadman’s switch value of three”. That is the value of the checksum is either “always or never” a multiple of three (as we adjust the least significant digits of the measurement string to ensure this condition). To an unsuspecting observer they see as expected about half the checksums being odd and half of them being even in value, so suspicion is not raised.

Now lets assume that the use of being a multiple of three is now used to send a stream of “binary data”.

We get a “covert channel” that can be used to send messages through the system as though it’s “transparent”.

As such this covert channel can be used to tell a down stream system to behave differently… For instance to falsify “identity checking” to say an entity “matches” when in fact they “don’t”.

Such an attack works against both “on-line” (database) and “off-line” (token) data checking systems as the reality is the data never gets checked, thus any “integrity check” gets bypassed thus fails.

You can extend this attack philosophy against any integrity system where there is redundance in what is being checked for integrity.

It’s one of the reasons “Digital Rights Management”(DRM) systems of the 1990’s always ended in ignominious ends.

So you can form a useful metric,

“Where there are covert channels integrity can not be assured.”

And as Claude Shannon demonstrated there can not be communications “without redundancy”. And later Gus Simmons demonstrated that where there is redundancy, there can be “covert channels”…

So you have a second useful metric,

Where there is communications there will always be covert channel capability.

And by definition,

All measurements are a communications channel with redundancy.

So you can not have “integrity” across the communications interface between two independent systems.

Clive Robinson • June 28, 2025 2:29 AM

@ Bruce,

In Reply.

You make a number of points and raise a number of questions, effectively starting with,

“If you’re building an AI system, integrity is your biggest security problem. And it’s one we’re going to need to think about, talk about, and figure out how to solve.”

It’s broader than just “AI Systems” it’s true of any system that works on information[1] or just observes it by measurement[2].

Humans have been thinking and talking formally about the integrity of information[3] based on thousands of years of societal grouping, and probably longer than we have been able to store information independently and verifiably of the say of human communication.

In all of that time we have not been able to give information actual integrity, and I think it’s not actually possible to do so.

The fundamental reason for this is information actually has no tangible form that is it is always derived by observation or measurement and as such is intangible or incorporeal[2]. Something that has long been recognised by common law[3].

But importantly all information must be communicable to exist within human comprehension as “knowledge” thus have utility[2]. And as importantly information can only exist in human comprehension by the process of measurement in an environment through observation[1].

As I’ve indicted in my post above,

https://www.schneier.com/blog/archives/2025/06/the-age-of-integrity.html/#comment-446192

The works of Claude Shannon and Gus Simmons show that any kind of “measurement” is fallible as it “communicates information” that has “redundancy” thus “covert channels”. Thus in turn as “Integrity” is very much an observational measurement process it too is fallible and not reliable as a binary result test process.

Thus “integrity” is at best a probabilistic not absolute measure[4].

Further it can be argued as Shannon did for information, that an estimate of integrity is equivalent to entropy in statistical thermodynamics.

Moving on,

You note that,

“We need research into integrous system design.”

Which based on human history and the formation of societies is going to form a new field or domain of research in “technical systems” design (as arguably it has failed in the more general human society).

You outline some areas with,

“We need research into a series of hard problems that encompass both data and computational integrity. How do we test and measure integrity? How do we build verifiable sensors with auditable system outputs? How to we build integrous data processing units? How do we recover from an integrity breach? These are just a few of the questions we will need to answer once we start poking around at integrity.”

The problem as I’ve indicated with security in the past, is that,

“We do not have usable measures.”

In part because,

1, We do not know what we need to measure or why.
2, We do not know how to quantify what we can measure.
3, We do not know how to extract meaning from what we do measure.

As I’ve indicated integrity,

“Does not cross boundaries”

Due to the fact it does not translate into information in ways we can rely upon. Whilst this is an issue for all measurements, mostly the measurements we do have can be reduced down by the laws of nature down to “physical constants”. Which can then be used in a “physical verification process” that can be independently verified and checked.

Thus the question arises,

“What equivalent to a physical constant do we have that we can use as a yardstick to calibrate measures for either security or integrity?”

With regards your question of,

“How [d]o we build integrous data processing units?”

I’ve answered this in the past. The underlying issue is not having anything we can implicitly trust to build upon.

Trust and security are not intrinsic properties of energy or matter we have to build them with components that lack them. This build a property is something we almost take for granted in our technology based world.

In the past I’ve pointed out that we once built fortifications on “solid ground” but as trade became essential we learned how to build fortifications on moving constructs such as wheeled vehicles and then ships.

Ships as such do not have “foundations” and do not require solid ground beneath them. They work by the principle of “displacement of mass”.

Thus the question arises is there a way to get a similar effect for trust and security.

The answer was “yes” and it gave rise to “Castles v Prisons” that I’ve previously outlined (and people can search for on this blog).

Which brings us onto your final question of,

“How do we recover from an integrity breach?”

Well there is good news and bad news on that.

Integrity is about how data is processed rather than the data it’s self.

Whilst there is little that can be done once data has got beyond an individuals control, integrity is something that can be restored or repaired after it has been broken.

Thus Integrity is more like the tap, than the water that can flow through it.

Thus the question arises from that is one of “confidentiality”. Or can we make the loss of Integrity less meaningful in a similar way to having data encrypted can reduce the impact of having security broken?

This is actually difficult to answer because we don’t yet realy have a meaningful measure of what Integrity actually is at a nuts and bolts level.

Thus this is perhaps one of the fundamental issues of Integrity we should be looking to solve first.

As the old,

“You know it when you see it”

Is not really of use.

[1] “Information” has various definitions but one of the more basic is,

“Data that has meaning”

Thus utility, and is derived from observation or measurement by an observer or process. As has been pointed out on occasions data[2] has levels of meaning or context, thus we have,

1.1, Raw data from observation
1.2, Meta-data which is effectively “data about raw data” and gives observation meaning.
1.3, Meta-meta-data which is “data about processes of obtaining Meta-data”.

Thus arguably data is “turtles all the way down”.

[2] But what is “data” well in a circular argument it’s information that has utility. But more usefully it’s the result of making measurements, that can be,

2.1, Communicated
2.2, Stored
2.3, Processed

However it can be seen that each is dependent on what proceeds it. Thus to process information/data you must be able to both store and communicate it, and to store information/data you must be able to communicate it.

As such we generally consider communication of information/data to be by the movement of matter or energy under the influence of a force. Where the energy or matter has information modulated or impressed upon it.

However the process of modulation or impressing allows us to see information/data as being separate from energy and matter and the forces and limits that act upon them, so treated not just abstractly but as a separate entity. In effect information has an incorporeal form[3], thus can be treated mathematically in it’s own right. Hence since the late 1950’s we have “Information theory” started by the work of Claude Shannon and others during and prior to WWII.

[3] Information / data as an “incorporeal form” has meaning beyond mathematics and the laws of nature. Since “The times of the Tudors” it has had “legal status” via “letters of Patent” and earlier “common law”. A result of which is a written document can have a status equivalent of a “natural person” hence in treaties, legislation and regulation the expression

“Any person legal or natural”

Or equivalent can be found.

[4] For those not familiar with absolute and other measures it’s easier to provide a link,

https://thisvsthat.io/absolute-vs-relative

Daniel • June 28, 2025 11:31 AM

Fabulous essay and comments, had to read everything twice.

I still sometimes think that Bruce might be Clive, or vice-a-versa :)).

Mr Stuart Smiles • June 28, 2025 1:43 PM

Hi Bruce.

Have you been keeping up to date on the Postoffice scandal / Inquiry in the UK?

There is a first report coming from Wyn Williams’ Inquiry into the post office inquiry on their youtube channel.

It needs / has the subject of Computer system trustworthiness / integrity at the heart of the issue, (as well as the organisational activities).

Short-ish Summary of issues :
https://en.m.wikipedia.org/wiki/Mr_Bates_vs_The_Post_Office

Inquiry Youtube channel:

https://youtube.com/@postofficehorizonitinquiry947?si=xWgpEaGZ8XArfAOu

Ian Stewart • June 28, 2025 3:33 PM

@Mr Stuart Smiles:
Also there is convincing evidence that the data was altered from the data centre, even though the company stated it could not be altered.

Ronald Peterson • June 29, 2025 7:20 PM

There has been some study of data provenance which could be an important part of data integrity:

https://www.cs.dartmouth.edu/~dfk/research/prasad-nethealth13/index.html

Gary Stoneburner • June 30, 2025 2:06 PM

Suggest that it might be as availability to available, integrity to correct. (PS: High integrity software is highly correct software.) Cheers!

Grima Squeakersen • July 2, 2025 3:58 PM

Just as human-space integrity measures such as fact-checking can be (trivially, apparently) perverted to deliver falsity (corruption) instead of their advertised goal, I strongly suspect that nearly all allegorical machine-space integrity measures are equally susceptible. The disturbing difference would be in the difficulty for most to understand that it was happening.

avner • July 15, 2025 6:31 AM

Hmmm: “Confidentiality is to confidential, and availability is to available, as integrity is to what?”
Well integrity is actually different as it incorporates to two distinct qualities: True and Complete.

Johan Lammens • July 15, 2025 12:27 PM

“integral” may be the right adjective here, see e.g. WordWeb:

Adjective: integral
1. Existing as an essential constituent or characteristic
2. Constituting the undiminished entirety; lacking nothing essential especially not damaged
3. Of or denoted by an integer

Noun: integral
1. The result of a mathematical integration; F(x) is the integral of f(x) if dF/dx = f(x)

[WordWeb.info]

Note that the mathematical concept is a noun, not an adjective.

Charlotte • July 26, 2025 5:40 PM

I think the adjectival form is “integral.”

Christopher Deeble • September 15, 2025 9:08 AM

For beauty three things are required: integrity, harmony, and clarity

Winter • September 15, 2025 9:53 AM

@Christopher Deeble

For beauty three things are required: integrity, harmony, and clarity

Only one thing is necessary,
the eye of a beholder.

The Age of Integrity

Comments

Leave a comment Cancel reply