Entries Tagged "web"

Page 1 of 14

Web 3.0 Requires Data Integrity

If you’ve ever taken a computer security class, you’ve probably learned about the three legs of computer security—confidentiality, integrity, and availability—known as the CIA triad. When we talk about a system being secure, that’s what we’re referring to. All are important, but to different degrees in different contexts. In a world populated by artificial intelligence (AI) systems and artificial intelligent agents, integrity will be paramount.

What is data integrity? It’s ensuring that no one can modify data—that’s the security angle—but it’s much more than that. It encompasses accuracy, completeness, and quality of data—all over both time and space. It’s preventing accidental data loss; the “undo” button is a primitive integrity measure. It’s also making sure that data is accurate when it’s collected—that it comes from a trustworthy source, that nothing important is missing, and that it doesn’t change as it moves from format to format. The ability to restart your computer is another integrity measure.

The CIA triad has evolved with the Internet. The first iteration of the Web—Web 1.0 of the 1990s and early 2000s—prioritized availability. This era saw organizations and individuals rush to digitize their content, creating what has become an unprecedented repository of human knowledge. Organizations worldwide established their digital presence, leading to massive digitization projects where quantity took precedence over quality. The emphasis on making information available overshadowed other concerns.

As Web technologies matured, the focus shifted to protecting the vast amounts of data flowing through online systems. This is Web 2.0: the Internet of today. Interactive features and user-generated content transformed the Web from a read-only medium to a participatory platform. The increase in personal data, and the emergence of interactive platforms for e-commerce, social media, and online everything demanded both data protection and user privacy. Confidentiality became paramount.

We stand at the threshold of a new Web paradigm: Web 3.0. This is a distributed, decentralized, intelligent Web. Peer-to-peer social-networking systems promise to break the tech monopolies’ control on how we interact with each other. Tim Berners-Lee’s open W3C protocol, Solid, represents a fundamental shift in how we think about data ownership and control. A future filled with AI agents requires verifiable, trustworthy personal data and computation. In this world, data integrity takes center stage.

For example, the 5G communications revolution isn’t just about faster access to videos; it’s about Internet-connected things talking to other Internet-connected things without our intervention. Without data integrity, for example, there’s no real-time car-to-car communications about road movements and conditions. There’s no drone swarm coordination, smart power grid, or reliable mesh networking. And there’s no way to securely empower AI agents.

In particular, AI systems require robust integrity controls because of how they process data. This means technical controls to ensure data is accurate, that its meaning is preserved as it is processed, that it produces reliable results, and that humans can reliably alter it when it’s wrong. Just as a scientific instrument must be calibrated to measure reality accurately, AI systems need integrity controls that preserve the connection between their data and ground truth.

This goes beyond preventing data tampering. It means building systems that maintain verifiable chains of trust between their inputs, processing, and outputs, so humans can understand and validate what the AI is doing. AI systems need clean, consistent, and verifiable control processes to learn and make decisions effectively. Without this foundation of verifiable truth, AI systems risk becoming a series of opaque boxes.

Recent history provides many sobering examples of integrity failures that naturally undermine public trust in AI systems. Machine-learning (ML) models trained without thought on expansive datasets have produced predictably biased results in hiring systems. Autonomous vehicles with incorrect data have made incorrect—and fatal—decisions. Medical diagnosis systems have given flawed recommendations without being able to explain themselves. A lack of integrity controls undermines AI systems and harms people who depend on them.

They also highlight how AI integrity failures can manifest at multiple levels of system operation. At the training level, data may be subtly corrupted or biased even before model development begins. At the model level, mathematical foundations and training processes can introduce new integrity issues even with clean data. During execution, environmental changes and runtime modifications can corrupt previously valid models. And at the output level, the challenge of verifying AI-generated content and tracking it through system chains creates new integrity concerns. Each level compounds the challenges of the ones before it, ultimately manifesting in human costs, such as reinforced biases and diminished agency.

Think of it like protecting a house. You don’t just lock a door; you also use safe concrete foundations, sturdy framing, a durable roof, secure double-pane windows, and maybe motion-sensor cameras. Similarly, we need digital security at every layer to ensure the whole system can be trusted.

This layered approach to understanding security becomes increasingly critical as AI systems grow in complexity and autonomy, particularly with large language models (LLMs) and deep-learning systems making high-stakes decisions. We need to verify the integrity of each layer when building and deploying digital systems that impact human lives and societal outcomes.

At the foundation level, bits are stored in computer hardware. This represents the most basic encoding of our data, model weights, and computational instructions. The next layer up is the file system architecture: the way those binary sequences are organized into structured files and directories that a computer can efficiently access and process. In AI systems, this includes how we store and organize training data, model checkpoints, and hyperparameter configurations.

On top of that are the application layers—the programs and frameworks, such as PyTorch and TensorFlow, that allow us to train models, process data, and generate outputs. This layer handles the complex mathematics of neural networks, gradient descent, and other ML operations.

Finally, at the user-interface level, we have visualization and interaction systems—what humans actually see and engage with. For AI systems, this could be everything from confidence scores and prediction probabilities to generated text and images or autonomous robot movements.

Why does this layered perspective matter? Vulnerabilities and integrity issues can manifest at any level, so understanding these layers helps security experts and AI researchers perform comprehensive threat modeling. This enables the implementation of defense-in-depth strategies—from cryptographic verification of training data to robust model architectures to interpretable outputs. This multi-layered security approach becomes especially crucial as AI systems take on more autonomous decision-making roles in critical domains such as healthcare, finance, and public safety. We must ensure integrity and reliability at every level of the stack.

The risks of deploying AI without proper integrity control measures are severe and often underappreciated. When AI systems operate without sufficient security measures to handle corrupted or manipulated data, they can produce subtly flawed outputs that appear valid on the surface. The failures can cascade through interconnected systems, amplifying errors and biases. Without proper integrity controls, an AI system might train on polluted data, make decisions based on misleading assumptions, or have outputs altered without detection. The results of this can range from degraded performance to catastrophic failures.

We see four areas where integrity is paramount in this Web 3.0 world. The first is granular access, which allows users and organizations to maintain precise control over who can access and modify what information and for what purposes. The second is authentication—much more nuanced than the simple “Who are you?” authentication mechanisms of today—which ensures that data access is properly verified and authorized at every step. The third is transparent data ownership, which allows data owners to know when and how their data is used and creates an auditable trail of data providence. Finally, the fourth is access standardization: common interfaces and protocols that enable consistent data access while maintaining security.

Luckily, we’re not starting from scratch. There are open W3C protocols that address some of this: decentralized identifiers for verifiable digital identity, the verifiable credentials data model for expressing digital credentials, ActivityPub for decentralized social networking (that’s what Mastodon uses), Solid for distributed data storage and retrieval, and WebAuthn for strong authentication standards. By providing standardized ways to verify data provenance and maintain data integrity throughout its lifecycle, Web 3.0 creates the trusted environment that AI systems require to operate reliably. This architectural leap for integrity control in the hands of users helps ensure that data remains trustworthy from generation and collection through processing and storage.

Integrity is essential to trust, on both technical and human levels. Looking forward, integrity controls will fundamentally shape AI development by moving from optional features to core architectural requirements, much as SSL certificates evolved from a banking luxury to a baseline expectation for any Web service.

Web 3.0 protocols can build integrity controls into their foundation, creating a more reliable infrastructure for AI systems. Today, we take availability for granted; anything less than 100% uptime for critical websites is intolerable. In the future, we will need the same assurances for integrity. Success will require following practical guidelines for maintaining data integrity throughout the AI lifecycle—from data collection through model training and finally to deployment, use, and evolution. These guidelines will address not just technical controls but also governance structures and human oversight, similar to how privacy policies evolved from legal boilerplate into comprehensive frameworks for data stewardship. Common standards and protocols, developed through industry collaboration and regulatory frameworks, will ensure consistent integrity controls across different AI systems and applications.

Just as the HTTPS protocol created a foundation for trusted e-commerce, it’s time for new integrity-focused standards to enable the trusted AI services of tomorrow.

This essay was written with Davi Ottenheimer, and originally appeared in Communications of the ACM.

Posted on April 3, 2025 at 7:05 AMView Comments

Thousands of WordPress Websites Infected with Malware

The malware includes four separate backdoors:

Creating four backdoors facilitates the attackers having multiple points of re-entry should one be detected and removed. A unique case we haven’t seen before. Which introduces another type of attack made possibly by abusing websites that don’t monitor 3rd party dependencies in the browser of their users.

The four backdoors:

The functions of the four backdoors are explained below:

  • Backdoor 1, which uploads and installs a fake plugin named “Ultra SEO Processor,” which is then used to execute attacker-issued commands
  • Backdoor 2, which injects malicious JavaScript into wp-config.php
  • Backdoor 3, which adds an attacker-controlled SSH key to the ~/.ssh/authorized_keys file so as to allow persistent remote access to the machine
  • Backdoor 4, which is designed to execute remote commands and fetches another payload from gsocket[.]io to likely open a reverse shell.

Posted on March 10, 2025 at 7:01 AMView Comments

New Ways to Track Internet Browsing

Interesting research on web tracking: “Who Left Open the Cookie Jar? A Comprehensive Evaluation of Third-Party Cookie Policies:

Abstract: Nowadays, cookies are the most prominent mechanism to identify and authenticate users on the Internet. Although protected by the Same Origin Policy, popular browsers include cookies in all requests, even when these are cross-site. Unfortunately, these third-party cookies enable both cross-site attacks and third-party tracking. As a response to these nefarious consequences, various countermeasures have been developed in the form of browser extensions or even protection mechanisms that are built directly into the browser.

In this paper, we evaluate the effectiveness of these defense mechanisms by leveraging a framework that automatically evaluates the enforcement of the policies imposed to third-party requests. By applying our framework, which generates a comprehensive set of test cases covering various web mechanisms, we identify several flaws in the policy implementations of the 7 browsers and 46 browser extensions that were evaluated. We find that even built-in protection mechanisms can be circumvented by multiple novel techniques we discover. Based on these results, we argue that our proposed framework is a much-needed tool to detect bypasses and evaluate solutions to the exposed leaks. Finally, we analyze the origin of the identified bypass techniques, and find that these are due to a variety of implementation, configuration and design flaws.

The researchers discovered many new tracking techniques that work despite all existing anonymous browsing tools. These have not yet been seen in the wild, but that will change soon.

Three news articles. BoingBoing post.

Posted on August 17, 2018 at 5:26 AMView Comments

Visiting a Website against the Owner's Wishes Is Now a Federal Crime

While we’re on the subject of terrible 9th Circuit Court rulings:

The U.S. Court of Appeals for the 9th Circuit has handed down a very important decision on the Computer Fraud and Abuse Act…. Its reasoning appears to be very broad. If I’m reading it correctly, it says that if you tell people not to visit your website, and they do it anyway knowing you disapprove, they’re committing a federal crime of accessing your computer without authorization.

Posted on July 13, 2016 at 2:10 PMView Comments

The Ads vs. Ad Blockers Arms Race

For the past month or so, Forbes has been blocking browsers with ad blockers. Today, I tried to access a Wired article and the site blocked me for the same reason.

I see this as another battle in this continuing arms race, and hope/expect that the ad blockers will update themselves to fool the ad blocker detectors.

But in a fine example of irony, the Forbes site has been serving malware in its ads.

And it seems that Forbes is inconsistently using its ad blocker blocker. At least, I was able to get to that linked article last week. But then I couldn’t get to another article a few days later.

Posted on February 23, 2016 at 12:18 PMView Comments

Fighting DRM in the W3C

Cory Doctorow has a good post on the EFF website about how they’re trying to fight digital rights management software in the World Wide Web Consortium.

So we came back with a new proposal: the W3C could have its cake and eat it too. It could adopt a rule that requires members who help make DRM standards to promise not to sue people who report bugs in tools that conform to those standards, nor could they sue people just for making a standards-based tool that connected to theirs. They could make DRM, but only if they made sure that they took steps to stop that DRM from being used to attack the open Web.

The W3C added DRM to the web’s standards in 2013. This doesn’t reverse that terrible decision, but it’s a step in the right direction.

Posted on January 14, 2016 at 2:13 PMView Comments

More on Heartbleed

This is an update to my earlier post.

Cloudflare is reporting that it’s very difficult, if not practically impossible, to steal SSL private keys with this attack.

Here’s the good news: after extensive testing on our software stack, we have been unable to successfully use Heartbleed on a vulnerable server to retrieve any private key data. Note that is not the same as saying it is impossible to use Heartbleed to get private keys. We do not yet feel comfortable saying that. However, if it is possible, it is at a minimum very hard. And, we have reason to believe based on the data structures used by OpenSSL and the modified version of NGINX that we use, that it may in fact be impossible.

The reasoning is complicated, and I suggest people read the post. What I have heard from people who actually ran the attack against a various servers is that what you get is a huge variety of cruft, ranging from indecipherable binary to useless log messages to peoples’ passwords. The variability is huge.

This xkcd comic is a very good explanation of how the vulnerability works. And this post by Dan Kaminsky is worth reading.

I have a lot to say about the human aspects of this: auditing of open-source code, how the responsible disclosure process worked in this case, the ease with which anyone could weaponize this with just a few lines of script, how we explain vulnerabilities to the public—and the role that impressive logo played in the process—and our certificate issuance and revocation process. This may be a massive computer vulnerability, but all of the interesting aspects of it are human.

EDITED TO ADD (4/12): We have one example of someone successfully retrieving an SSL private key using Heartbleed. So it’s possible, but it seems to be much harder than we originally thought.

And we have a story where two anonymous sources have claimed that the NSA has been exploiting Heartbleed for two years.

EDITED TO ADD (4/12): Hijacking user sessions with Heartbleed. And a nice essay on the marketing and communications around the vulnerability

EDITED TO ADD (4/13): The US intelligence community has denied prior knowledge of Heatbleed. The statement is word-game free:

NSA was not aware of the recently identified vulnerability in OpenSSL, the so-called Heartbleed vulnerability, until it was made public in a private sector cybersecurity report. Reports that say otherwise are wrong.

The statement also says:

Unless there is a clear national security or law enforcement need, this process is biased toward responsibly disclosing such vulnerabilities.

Since when is “law enforcement need” included in that decision process? This national security exception to law and process is extending much too far into normal police work.

Another point. According to the original Bloomberg article:

http://www.bloomberg.com/news/2014-04-11/nsa-said-to-have-used-heartbleed-bug-exposing-consumers.html

Certainly a plausible statement. But if those millions didn’t discover something obvious like Heartbleed, shouldn’t we investigate them for incompetence?

Finally—not related to the NSA—this is good information on which sites are still vulnerable, including historical data.

Posted on April 11, 2014 at 1:10 PMView Comments

1 2 3 14

Sidebar photo of Bruce Schneier by Joe MacInnis.