Page 33

Web 3.0 Requires Data Integrity

If you’ve ever taken a computer security class, you’ve probably learned about the three legs of computer security—confidentiality, integrity, and availability—known as the CIA triad. When we talk about a system being secure, that’s what we’re referring to. All are important, but to different degrees in different contexts. In a world populated by artificial intelligence (AI) systems and artificial intelligent agents, integrity will be paramount.

What is data integrity? It’s ensuring that no one can modify data—that’s the security angle—but it’s much more than that. It encompasses accuracy, completeness, and quality of data—all over both time and space. It’s preventing accidental data loss; the “undo” button is a primitive integrity measure. It’s also making sure that data is accurate when it’s collected—that it comes from a trustworthy source, that nothing important is missing, and that it doesn’t change as it moves from format to format. The ability to restart your computer is another integrity measure.

The CIA triad has evolved with the Internet. The first iteration of the Web—Web 1.0 of the 1990s and early 2000s—prioritized availability. This era saw organizations and individuals rush to digitize their content, creating what has become an unprecedented repository of human knowledge. Organizations worldwide established their digital presence, leading to massive digitization projects where quantity took precedence over quality. The emphasis on making information available overshadowed other concerns.

As Web technologies matured, the focus shifted to protecting the vast amounts of data flowing through online systems. This is Web 2.0: the Internet of today. Interactive features and user-generated content transformed the Web from a read-only medium to a participatory platform. The increase in personal data, and the emergence of interactive platforms for e-commerce, social media, and online everything demanded both data protection and user privacy. Confidentiality became paramount.

We stand at the threshold of a new Web paradigm: Web 3.0. This is a distributed, decentralized, intelligent Web. Peer-to-peer social-networking systems promise to break the tech monopolies’ control on how we interact with each other. Tim Berners-Lee’s open W3C protocol, Solid, represents a fundamental shift in how we think about data ownership and control. A future filled with AI agents requires verifiable, trustworthy personal data and computation. In this world, data integrity takes center stage.

For example, the 5G communications revolution isn’t just about faster access to videos; it’s about Internet-connected things talking to other Internet-connected things without our intervention. Without data integrity, for example, there’s no real-time car-to-car communications about road movements and conditions. There’s no drone swarm coordination, smart power grid, or reliable mesh networking. And there’s no way to securely empower AI agents.

In particular, AI systems require robust integrity controls because of how they process data. This means technical controls to ensure data is accurate, that its meaning is preserved as it is processed, that it produces reliable results, and that humans can reliably alter it when it’s wrong. Just as a scientific instrument must be calibrated to measure reality accurately, AI systems need integrity controls that preserve the connection between their data and ground truth.

This goes beyond preventing data tampering. It means building systems that maintain verifiable chains of trust between their inputs, processing, and outputs, so humans can understand and validate what the AI is doing. AI systems need clean, consistent, and verifiable control processes to learn and make decisions effectively. Without this foundation of verifiable truth, AI systems risk becoming a series of opaque boxes.

Recent history provides many sobering examples of integrity failures that naturally undermine public trust in AI systems. Machine-learning (ML) models trained without thought on expansive datasets have produced predictably biased results in hiring systems. Autonomous vehicles with incorrect data have made incorrect—and fatal—decisions. Medical diagnosis systems have given flawed recommendations without being able to explain themselves. A lack of integrity controls undermines AI systems and harms people who depend on them.

They also highlight how AI integrity failures can manifest at multiple levels of system operation. At the training level, data may be subtly corrupted or biased even before model development begins. At the model level, mathematical foundations and training processes can introduce new integrity issues even with clean data. During execution, environmental changes and runtime modifications can corrupt previously valid models. And at the output level, the challenge of verifying AI-generated content and tracking it through system chains creates new integrity concerns. Each level compounds the challenges of the ones before it, ultimately manifesting in human costs, such as reinforced biases and diminished agency.

Think of it like protecting a house. You don’t just lock a door; you also use safe concrete foundations, sturdy framing, a durable roof, secure double-pane windows, and maybe motion-sensor cameras. Similarly, we need digital security at every layer to ensure the whole system can be trusted.

This layered approach to understanding security becomes increasingly critical as AI systems grow in complexity and autonomy, particularly with large language models (LLMs) and deep-learning systems making high-stakes decisions. We need to verify the integrity of each layer when building and deploying digital systems that impact human lives and societal outcomes.

At the foundation level, bits are stored in computer hardware. This represents the most basic encoding of our data, model weights, and computational instructions. The next layer up is the file system architecture: the way those binary sequences are organized into structured files and directories that a computer can efficiently access and process. In AI systems, this includes how we store and organize training data, model checkpoints, and hyperparameter configurations.

On top of that are the application layers—the programs and frameworks, such as PyTorch and TensorFlow, that allow us to train models, process data, and generate outputs. This layer handles the complex mathematics of neural networks, gradient descent, and other ML operations.

Finally, at the user-interface level, we have visualization and interaction systems—what humans actually see and engage with. For AI systems, this could be everything from confidence scores and prediction probabilities to generated text and images or autonomous robot movements.

Why does this layered perspective matter? Vulnerabilities and integrity issues can manifest at any level, so understanding these layers helps security experts and AI researchers perform comprehensive threat modeling. This enables the implementation of defense-in-depth strategies—from cryptographic verification of training data to robust model architectures to interpretable outputs. This multi-layered security approach becomes especially crucial as AI systems take on more autonomous decision-making roles in critical domains such as healthcare, finance, and public safety. We must ensure integrity and reliability at every level of the stack.

The risks of deploying AI without proper integrity control measures are severe and often underappreciated. When AI systems operate without sufficient security measures to handle corrupted or manipulated data, they can produce subtly flawed outputs that appear valid on the surface. The failures can cascade through interconnected systems, amplifying errors and biases. Without proper integrity controls, an AI system might train on polluted data, make decisions based on misleading assumptions, or have outputs altered without detection. The results of this can range from degraded performance to catastrophic failures.

We see four areas where integrity is paramount in this Web 3.0 world. The first is granular access, which allows users and organizations to maintain precise control over who can access and modify what information and for what purposes. The second is authentication—much more nuanced than the simple “Who are you?” authentication mechanisms of today—which ensures that data access is properly verified and authorized at every step. The third is transparent data ownership, which allows data owners to know when and how their data is used and creates an auditable trail of data providence. Finally, the fourth is access standardization: common interfaces and protocols that enable consistent data access while maintaining security.

Luckily, we’re not starting from scratch. There are open W3C protocols that address some of this: decentralized identifiers for verifiable digital identity, the verifiable credentials data model for expressing digital credentials, ActivityPub for decentralized social networking (that’s what Mastodon uses), Solid for distributed data storage and retrieval, and WebAuthn for strong authentication standards. By providing standardized ways to verify data provenance and maintain data integrity throughout its lifecycle, Web 3.0 creates the trusted environment that AI systems require to operate reliably. This architectural leap for integrity control in the hands of users helps ensure that data remains trustworthy from generation and collection through processing and storage.

Integrity is essential to trust, on both technical and human levels. Looking forward, integrity controls will fundamentally shape AI development by moving from optional features to core architectural requirements, much as SSL certificates evolved from a banking luxury to a baseline expectation for any Web service.

Web 3.0 protocols can build integrity controls into their foundation, creating a more reliable infrastructure for AI systems. Today, we take availability for granted; anything less than 100% uptime for critical websites is intolerable. In the future, we will need the same assurances for integrity. Success will require following practical guidelines for maintaining data integrity throughout the AI lifecycle—from data collection through model training and finally to deployment, use, and evolution. These guidelines will address not just technical controls but also governance structures and human oversight, similar to how privacy policies evolved from legal boilerplate into comprehensive frameworks for data stewardship. Common standards and protocols, developed through industry collaboration and regulatory frameworks, will ensure consistent integrity controls across different AI systems and applications.

Just as the HTTPS protocol created a foundation for trusted e-commerce, it’s time for new integrity-focused standards to enable the trusted AI services of tomorrow.

This essay was written with Davi Ottenheimer, and originally appeared in Communications of the ACM.

Posted on April 3, 2025 at 7:05 AMView Comments

Rational Astrologies and Security

John Kelsey and I wrote a short paper for the Rossfest Festschrift: “Rational Astrologies and Security“:

There is another non-security way that designers can spend their security budget: on making their own lives easier. Many of these fall into the category of what has been called rational astrology. First identified by Randy Steve Waldman [Wal12], the term refers to something people treat as though it works, generally for social or institutional reasons, even when there’s little evidence that it works—­and sometimes despite substantial evidence that it does not.

[…]

Both security theater and rational astrologies may seem irrational, but they are rational from the perspective of the people making the decisions about security. Security theater is often driven by information asymmetry: people who don’t understand security can be reassured with cosmetic or psychological measures, and sometimes that reassurance is important. It can be better understood by considering the many non-security purposes of a security system. A monitoring bracelet system that pairs new mothers and their babies may be security theater, considering the incredibly rare instances of baby snatching from hospitals. But it makes sense as a security system designed to alleviate fears of new mothers [Sch07].

Rational astrologies in security result from two considerations. The first is the principal­-agent problem: The incentives of the individual or organization making the security decision are not always aligned with the incentives of the users of that system. The user’s well-being may not weigh as heavily on the developer’s mind as the difficulty of convincing his boss to take a chance by ignoring an outdated security rule or trying some new technology.

The second consideration that can lead to a rational astrology is where there is a social or institutional need for a solution to a problem for which there is actually not a particularly good solution. The organization needs to reassure regulators, customers, or perhaps even a judge and jury that “they did all that could be done” to avoid some problem—even if “all that could be done” wasn’t very much.

Posted on April 2, 2025 at 7:04 AMView Comments

Cell Phone OPSEC for Border Crossings

I have heard stories of more aggressive interrogation of electronic devices at US border crossings. I know a lot about securing computers, but very little about securing phones.

Are there easy ways to delete data—files, photos, etc.—on phones so it can’t be recovered? Does resetting a phone to factory defaults erase data, or is it still recoverable? That is, does the reset erase the old encryption key, or just sever the password that access that key? When the phone is rebooted, are deleted files still available?

We need answers for both iPhones and Android phones. And it’s not just the US; the world is going to become a more dangerous place to oppose state power.

Posted on April 1, 2025 at 7:01 AMView Comments

The Signal Chat Leak and the NSA

US National Security Advisor Mike Waltz, who started the now-infamous group chat coordinating a US attack against the Yemen-based Houthis on March 15, is seemingly now suggesting that the secure messaging service Signal has security vulnerabilities.

"I didn’t see this loser in the group," Waltz told Fox News about Atlantic editor in chief Jeffrey Goldberg, whom Waltz invited to the chat. "Whether he did it deliberately or it happened in some other technical mean, is something we’re trying to figure out."

Waltz’s implication that Goldberg may have hacked his way in was followed by a report from CBS News that the US National Security Agency (NSA) had sent out a bulletin to its employees last month warning them about a security "vulnerability" identified in Signal.

The truth, however, is much more interesting. If Signal has vulnerabilities, then China, Russia, and other US adversaries suddenly have a new incentive to discover them. At the same time, the NSA urgently needs to find and fix any vulnerabilities quickly as it can—and similarly, ensure that commercial smartphones are free of backdoors—access points that allow people other than a smartphone’s user to bypass the usual security authentication methods to access the device’s contents.

That is essential for anyone who wants to keep their communications private, which should be all of us.

It’s common knowledge that the NSA’s mission is breaking into and eavesdropping on other countries’ networks. (During President George W. Bush’s administration, the NSA conducted warrantless taps into domestic communications as well—surveillance that several district courts ruled to be illegal before those decisions were later overturned by appeals courts. To this day, many legal experts maintain that the program violated federal privacy protections.) But the organization has a secondary, complementary responsibility: to protect US communications from others who want to spy on them. That is to say: While one part of the NSA is listening into foreign communications, another part is stopping foreigners from doing the same to Americans.

Those missions never contradicted during the Cold War, when allied and enemy communications were wholly separate. Today, though, everyone uses the same computers, the same software, and the same networks. That creates a tension.

When the NSA discovers a technological vulnerability in a service such as Signal (or buys one on the thriving clandestine vulnerability market), does it exploit it in secret, or reveal it so that it can be fixed? Since at least 2014, a US government interagency "equities" process has been used to decide whether it is in the national interest to take advantage of a particular security flaw, or to fix it. The trade-offs are often complicated and hard.

Waltz—along with Vice President J.D. Vance, Defense Secretary Pete Hegseth, and the other officials in the Signal group—have just made the trade-offs much tougher to resolve. Signal is both widely available and widely used. Smaller governments that can’t afford their own military-grade encryption use it. Journalists, human rights workers, persecuted minorities, dissidents, corporate executives, and criminals around the world use it. Many of these populations are of great interest to the NSA.

At the same time, as we have now discovered, the app is being used for operational US military traffic. So, what does the NSA do if it finds a security flaw in Signal?

Previously, it might have preferred to keep the flaw quiet and use it to listen to adversaries. Now, if the agency does that, it risks someone else finding the same vulnerability and using it against the US government. And if it was later disclosed that the NSA could have fixed the problem and didn’t, then the results might be catastrophic for the agency.

Smartphones present a similar trade-off. The biggest risk of eavesdropping on a Signal conversation comes from the individual phones that the app is running on. While it’s largely unclear whether the US officials involved had downloaded the app onto personal or government-issued phones—although Witkoff suggested on X that the program was on his "personal devices"—smartphones are consumer devices, not at all suitable for classified US government conversations. An entire industry of spyware companies sells capabilities to remotely hack smartphones for any country willing to pay. More capable countries have more sophisticated operations. Just last year, attacks that were later attributed to China attempted to access both President Donald Trump and Vance’s smartphones. Previously, the FBI—as well as law enforcement agencies in other countries—have pressured both Apple and Google to add "backdoors" in their phones to more easily facilitate court-authorized eavesdropping.

These backdoors would create, of course, another vulnerability to be exploited. A separate attack from China last year accessed a similar capability built into US telecommunications networks.

The vulnerabilities equities have swung against weakened smartphone security and toward protecting the devices that senior government officials now use to discuss military secrets. That also means that they have swung against the US government hoarding Signal vulnerabilities—and toward full disclosure.

This is plausibly good news for Americans who want to talk among themselves without having anyone, government or otherwise, listen in. We don’t know what pressure the Trump administration is using to make intelligence services fall into line, but it isn’t crazy to worry that the NSA might again start monitoring domestic communications.

Because of the Signal chat leak, it’s less likely that they’ll use vulnerabilities in Signal to do that. Equally, bad actors such as drug cartels may also feel safer using Signal. Their security against the US government lies in the fact that the US government shares their vulnerabilities. No one wants their secrets exposed.

I have long advocated for a "defense dominant" cybersecurity strategy. As long as smartphones are in the pocket of every government official, police officer, judge, CEO, and nuclear power plant operator—and now that they are being used for what the White House now calls calls  "sensitive," if not outright classified conversations among cabinet members—we need them to be as secure as possible. And that means no government-mandated backdoors.

We may find out more about how officials—including the vice president of the United States—came to be using Signal on what seem to be consumer-grade smartphones, in a apparent breach of the laws on government records. It’s unlikely that they really thought through the consequences of their actions.

Nonetheless, those consequences are real. Other governments, possibly including US allies, will now have much more incentive to break Signal’s security than they did in the past, and more incentive to hack US government smartphones than they did before March 24.

For just the same reason, the US government has urgent incentives to protect them.

This essay was originally published in Foreign Policy.

Posted on March 31, 2025 at 7:04 AMView Comments

AIs as Trusted Third Parties

This is a truly fascinating paper: “Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible with Cryptography.” The basic idea is that AIs can act as trusted third parties:

Abstract: We often interact with untrusted parties. Prioritization of privacy can limit the effectiveness of these interactions, as achieving certain goals necessitates sharing private data. Traditionally, addressing this challenge has involved either seeking trusted intermediaries or constructing cryptographic protocols that restrict how much data is revealed, such as multi-party computations or zero-knowledge proofs. While significant advances have been made in scaling cryptographic approaches, they remain limited in terms of the size and complexity of applications they can be used for. In this paper, we argue that capable machine learning models can fulfill the role of a trusted third party, thus enabling secure computations for applications that were previously infeasible. In particular, we describe Trusted Capable Model Environments (TCMEs) as an alternative approach for scaling secure computation, where capable machine learning model(s) interact under input/output constraints, with explicit information flow control and explicit statelessness. This approach aims to achieve a balance between privacy and computational efficiency, enabling private inference where classical cryptographic solutions are currently infeasible. We describe a number of use cases that are enabled by TCME, and show that even some simple classic cryptographic problems can already be solved with TCME. Finally, we outline current limitations and discuss the path forward in implementing them.

When I was writing Applied Cryptography way back in 1993, I talked about human trusted third parties (TTPs). This research postulates that someday AIs could fulfill the role of a human TTP, with added benefits like (1) being able to audit their processing, and (2) being able to delete it and erase their knowledge when their work is done. And the possibilities are vast.

Here’s a TTP problem. Alice and Bob want to know whose income is greater, but don’t want to reveal their income to the other. (Assume that both Alice and Bob want the true answer, so neither has an incentive to lie.) A human TTP can solve that easily: Alice and Bob whisper their income to the TTP, who announces the answer. But now the human knows the data. There are cryptographic protocols that can solve this. But we can easily imagine more complicated questions that cryptography can’t solve. “Which of these two novel manuscripts has more sex scenes?” “Which of these two business plans is a riskier investment?” If Alice and Bob can agree on an AI model they both trust, they can feed the model the data, ask the question, get the answer, and then delete the model afterwards. And it’s reasonable for Alice and Bob to trust a model with questions like this. They can take the model into their own lab and test it a gazillion times until they are satisfied that it is fair, accurate, or whatever other properties they want.

The paper contains several examples where an AI TTP provides real value. This is still mostly science fiction today, but it’s a fascinating thought experiment.

Posted on March 28, 2025 at 7:01 AMView Comments

AI Data Poisoning

Cloudflare has a new feature—available to free users as well—that uses AI to generate random pages to feed to AI web crawlers:

Instead of simply blocking bots, Cloudflare’s new system lures them into a “maze” of realistic-looking but irrelevant pages, wasting the crawler’s computing resources. The approach is a notable shift from the standard block-and-defend strategy used by most website protection services. Cloudflare says blocking bots sometimes backfires because it alerts the crawler’s operators that they’ve been detected.

“When we detect unauthorized crawling, rather than blocking the request, we will link to a series of AI-generated pages that are convincing enough to entice a crawler to traverse them,” writes Cloudflare. “But while real looking, this content is not actually the content of the site we are protecting, so the crawler wastes time and resources.”

The company says the content served to bots is deliberately irrelevant to the website being crawled, but it is carefully sourced or generated using real scientific facts—­such as neutral information about biology, physics, or mathematics—­to avoid spreading misinformation (whether this approach effectively prevents misinformation, however, remains unproven).

It’s basically an AI-generated honeypot. And AI scraping is a growing problem:

The scale of AI crawling on the web appears substantial, according to Cloudflare’s data that lines up with anecdotal reports we’ve heard from sources. The company says that AI crawlers generate more than 50 billion requests to their network daily, amounting to nearly 1 percent of all web traffic they process. Many of these crawlers collect website data to train large language models without permission from site owners….

Presumably the crawlers will now have to up both their scraping stealth and their ability to filter out AI-generated content like this. Which means the honeypots will have to get better at detecting scrapers and more stealthy in their fake content. This arms race is likely to go back and forth, wasting a lot of energy in the process.

Posted on March 26, 2025 at 7:07 AMView Comments

Report on Paragon Spyware

Citizen Lab has a new report on Paragon’s spyware:

Key Findings:

  • Introducing Paragon Solutions. Paragon Solutions was founded in Israel in 2019 and sells spyware called Graphite. The company differentiates itself by claiming it has safeguards to prevent the kinds of spyware abuses that NSO Group and other vendors are notorious for.
  • Infrastructure Analysis of Paragon Spyware. Based on a tip from a collaborator, we mapped out server infrastructure that we attribute to Paragon’s Graphite spyware tool. We identified a subset of suspected Paragon deployments, including in Australia, Canada, Cyprus, Denmark, Israel, and Singapore.
  • Identifying a Possible Canadian Paragon Customer. Our investigation surfaced potential links between Paragon Solutions and the Canadian Ontario Provincial Police, and found evidence of a growing ecosystem of spyware capability among Ontario-based police services.
  • Helping WhatsApp Catch a Zero-Click. We shared our analysis of Paragon’s infrastructure with Meta, who told us that the details were pivotal to their ongoing investigation into Paragon. WhatsApp discovered and mitigated an active Paragon zero-click exploit, and later notified over 90 individuals who it believed were targeted, including civil society members in Italy.
  • Android Forensic Analysis: Italian Cluster. We forensically analyzed multiple Android phones belonging to Paragon targets in Italy (an acknowledged Paragon user) who were notified by WhatsApp. We found clear indications that spyware had been loaded into WhatsApp, as well as other apps on their devices.
  • A Related Case of iPhone Spyware in Italy. We analyzed the iPhone of an individual who worked closely with confirmed Android Paragon targets. This person received an Apple threat notification in November 2024, but no WhatsApp notification. Our analysis showed an attempt to infect the device with novel spyware in June 2024. We shared details with Apple, who confirmed they had patched the attack in iOS 18.
  • Other Surveillance Tech Deployed Against The Same Italian Cluster. We also note 2024 warnings sent by Meta to several individuals in the same organizational cluster, including a Paragon victim, suggesting the need for further scrutiny into other surveillance technology deployed against these individuals.

Posted on March 25, 2025 at 7:05 AMView Comments

More Countries are Demanding Backdoors to Encrypted Apps

Last month, I wrote about the UK forcing Apple to break its Advanced Data Protection encryption in iCloud. More recently, both Sweden and France are contemplating mandating backdoors. Both initiatives are attempting to scare people into supporting backdoors, which are—of course—are terrible idea.

Also: “A Feminist Argument Against Weakening Encryption.”

EDITED TO ADD (4/14): The French proposal was voted down.

Posted on March 24, 2025 at 6:38 AMView Comments

Sidebar photo of Bruce Schneier by Joe MacInnis.