Data as Pollution

Cory Doctorow has a new metaphor:

We should treat personal electronic data with the same care and respect as weapons-grade plutonium -- it is dangerous, long-lasting and once it has leaked there's no getting it back

I said something similar two years ago:

In some ways, this tidal wave of data is the pollution problem of the information age. All information processes produce it. If we ignore the problem, it will stay around forever. And the only way to successfully deal with it is to pass laws regulating its generation, use and eventual disposal.

Posted on January 30, 2008 at 12:35 PM • 34 Comments


Brandioch ConnerJanuary 30, 2008 1:20 PM

"Every gram - sorry, byte - of personal information these feckless data-packrats collect on us should be as carefully accounted for as our weapons-grade radioisotopes, because once the seals have cracked, there is no going back."

Not quite.

With a physical substance, you can test that the amount you have today is the same as the amount you had yesterday.

With data, how do you verify that it hasn't been copied?

Isn't that what happened with those CD's? A copy of the data was lost.

This is where we need __REAL__ digital rights management.

The system has to make it IMPOSSIBLE to copy the data to any other site. Particularly to CD or USB drive or laptop hard drive.

If we cannot do that (and we cannot), then we need to be able to legally fine any department or organization that allows data to be leaked.

The best way to keep the data safe ..... is to not collect it in the first place.

If you're going to require it, then you are going to PAY if you leak it.

Keith McMillanJanuary 30, 2008 1:31 PM

@Brandioch: Copy proofing data may sound like a good idea at first, but have you ever installed a strong cryptographic package, then had second thoughts before putting all your eggs in that basket? I just installed TrueCrypt on my notebook to protect client information I carry around with me, but the prospect of losing the keyfile, forgetting the password and having those valuable documents be irretrievable gives me pause. You can bet that I'm keeping a copy unencrypted on my office machine.

Making data that is impossible to copy would remove even the option of having a safety net.

AnonymousJanuary 30, 2008 1:34 PM

"The best way to keep the data safe ..... is to not collect it in the first place."

Yes, and the privacy & data protection laws of most EU countries just take care of that, by clarifying how others (companies, etc) can handle your personal data, if at all. Compared to the US, you have so much more control over what happens to that data.

erlehmannJanuary 30, 2008 1:59 PM

In my opinion, the best approach would be that of "Battlestar Galactica" (Miniseries, 2003). The main characters of the series survive the attack on their home planets only because they have a non-networked computer system on their spaceship - every input and output *has* to got through human review. In the series, all other spaceships relying on networked systems are doomed when the defense mainframe is cracked.

An interesting side effect of that would be that automated large-scale filtering of personal data would probably be not feasible.

Brandioch ConnerJanuary 30, 2008 2:16 PM

@Trichinosis USA
"If you can't copy the data, how are you gonna back it up?"

If you can back it up, how are you going to know that it has not been leaked?

Again, because we are NOT talking about a physical item, it is IMPOSSIBLE to know that it has not been leaked.

*rolls eyes*

Do I have to explain that we've had our spies compromised because someone could REMEMBER the details that he looked up?

The ONLY way to prevent the information from leaking is to NOT collect it.

Because the temptation to collect it is so great, the PUNISHMENT of losing it must be IMMENSE.

j0hnner_caJanuary 30, 2008 2:25 PM

> This is where we need __REAL__ digital rights management.

We need no DRM.

Not only is it a gigantic pain in the keister to the user but it is inevitable such systems will be utterly broken; due in part to that aforementioned keister affliction.

What we need is strong, open encryption to be used everywhere. Were these CDs encrypted?

Also, something you probably won't find as interesting as I did:

"...once London Underground has hiccoughup..."

See that? It's like some subconscious brainfart of him being torn between the Brit version "hiccough" and the American "hiccup"...

SedgequillJanuary 30, 2008 3:05 PM

The claim that personal data obtained by a business is owned by that business is complicating. Businesses making that claim will assert that they should be able to keep the information as long as they please, and they have lobbies to deal with legislators. was, if I correctly recall, in the first wave of corporations asserting ownership of personal customer data.

Businesses and government agencies often ask for more information than is needed in forms they request or require to be completed. More than ever, government agencies have access to private sector data. Data-keepers public and private tend to be afraid of getting in trouble for destroying data that might be asked for later. Health care providers and health insurance companies continue to have weak consumer protections and related security. Despite all the obstacles to the imposition of personal data retention schedules, though, I'm glad there's growing discussion of the need for them.

Sam GreenfieldJanuary 30, 2008 3:59 PM

Respectfully, the metaphor is cute but a bit over the top. It might be clever to compare lost personal information to weapons-grade plutonium, but the fact is that a very small amount of missing weapons-grade plutonium is much more dangerous than a very small amount of lost personal information.

Incidentally, you know who collected personal information and used it for evil? The Nazis.

Doctorow is bordering on reductio ad hitlerum.

A more apt comparison may be between personal information and arsenic. Arsenic can get leaked into the environment and we should defend against its release, but it is not nearly as dangerous as missing weapons grade plutonium.

[Disclaimer: This comment is a bit tongue in cheek....]

Brandioch ConnerJanuary 30, 2008 4:00 PM

"What we need is strong, open encryption to be used everywhere. Were these CDs encrypted?"

You (the organization) cannot ensure that every copy made will be encrypted. It is too easy to copy it and NOT encrypt it if you allow copies to be made.

And that does NOT even begin to address the issue of COLLECTING this data in the first place.

This problem needs to be solved as close to the root as possible. If the data was not collected, it could not be leaked.

Once it is collected, it has to be protected.

Once you allow copies to be made, you allow it to leak. Even if the copies are NOT electronic.

That's all there is to it.

wrong_problemJanuary 30, 2008 4:43 PM

The problem isn't that data need to be protected, but rather, people.

The fundamental problem is that the financial industry is still clinging to outdated identity models; models which put the consumer at risk if such secrets are divulged.

I should be able to register my private key with my bank, and my bank should know that any transaction _NOT_ signed by such key is counterfeit. My bank need not ever know my "real" identity; they only need to verify that transactions against an account are authorized by the person who opened the account. If the government and financial institutions actually took security seriously, the mere divulging of a single 9 digit number would not facilitate the fraud it does today.

TSJanuary 30, 2008 5:23 PM

@Trichinosis USA
"If you can't copy the data, how are you gonna back it up?"

Write the data twice.

CipherChaosJanuary 30, 2008 6:25 PM

I tend to think about "data pollution" along a different thread:

"We are drowning in information but starved for knowledge." --John Naisbitt

Has anyone ever just gotten tired from trying to sift through all the trash? It can be like finding a nutrient-packed needle, in a nation-sized haystack of worthless straw.

SedgequillJanuary 30, 2008 7:23 PM

The fact that computerized activities produce footprints or byproducts with personal significance gets spun into expecting us to accept all developer-designed data-gathering tools as necessities, without questioning what's being done. On the web, some site administrators gather more personal data than do others, by design. The most data-greedy, on and off the web, would like us to think that all personal data that an electronic activity or transaction produces, or that can be electronically gathered, is owed by users or consumers to whoever is providing the service.

Human motives shape collection of personal data to some extent; it's not all some IT manifest destiny. That's easier to realize when malicious computer-powered conduct is encountered and examined. What do we owe a phisher or a malware implanter?

archangelJanuary 30, 2008 7:58 PM

what we need, just as with media files, is not "Digital Rights Management", but a reliable encryption/decryption standard applied forcefully to the information. This has, occasionally, been called a "crydec", like codec, [en|de]cryption. Same solution, applied to a codec encoded file, sent to a trusted recipient with the codec, crydec and key, should work with data in file format, encryption, and key. But that's nothing new to anyone here, is it?

Michael AshJanuary 30, 2008 8:04 PM

The real problem with personal data is that the world is still effectively operating on security through obscurity. Huge amounts of the world effectively use the physical location of your dwelling as a shared secret! It's insane. As long as we rely on easily discovered, difficult to change facts which are useful for purposes besides authentication, insecurity will rule. The fix is to stop assuming that someone who knows my address and birthday is me.

Paul RenaultJanuary 30, 2008 8:18 PM

More countries need a paid-by-the-public Privacy Commissioner who places the citizens' privacy above all other considerations. Privacy is more important, IMHO, that most people think it is.

If you need a reason or two to consider why privacy is important, read "Privacy as Contextual Integrity" by Helen Nissenbaum.
I was directed to it by one of Bruce's earlier posts:

Cory Doctorow gave a talk entitled "Privacy Isn't Dead -- Let's Not Kill It" at OSCON last Summer. It's available as a MP3. It might help round out Cory's ideas.

I'd go on, but the Trailer Park Boys just came on the tube, and..Dj'know what I'm saying?

DamonJanuary 30, 2008 9:48 PM

The pollution analogy suggests a solution. The problems of pollution result from ill defined property rights and the resultant negative externalities. For instance, if nobody owns a lake nobody will have the incentive to keep the lake clean. If a property right is created in the lake then the owner will internalize the costs of the pollution.

It is the same with personal data. If nobody "owns" it then nobody takes care to protect it. If people are given a property right in their personal data and allowed to sue a company for mishandeling it, the company will internalize the costs they impose on the people who's data they fail to secure.

Tom HughesJanuary 30, 2008 10:10 PM

Big "yes" to Damon's point -- this is where the pollution analogy is really strong. Pollution occurs when the polluter has untrammeled access to a resource (like air or water) that they can use without limit and without expense. Similarly, storing my personal data is something a company can do without limit and without expense; it's cheaper to keep than to delete.

I'd like there to be a free-market solution, but I think you need government here to set an ownership right and a compulsory license: if I give my data to someone, I still own it, but I'm required to license it to them for some limited period and some defined set of uses. The recipient understands their rights and their obligations, which are identical across the economy, and terminate in some set period with deletion.

TomJanuary 31, 2008 1:06 AM

My college computer security systems professor said the same thing... "treat personal data like toxic waste"

Ronald van den HeetkampJanuary 31, 2008 3:22 AM

Ha, good metaphor.

Very close, since this toxic waste has a great market value. But I'm willing to take it one step further, "they" should not even be aware of it, or even "see" or treat it, because yeah: "isn't it personal data?"


jonJanuary 31, 2008 6:46 AM

Most companies and governments are pretty good about tracking money and not letting it slip away through heir databases, but they seem far less concerned about your data. Loss of data needs to be punishable in the same way that mishandling of funds would be.

It seems the common response to data losses or thefts tends to be: "Oh, sorry, my bad. No hard feelings, right?" Because there are no repercussions for not securing your data adequately.

Every time a portion of data about each person is lost, the company/government should be liable for a fine, plus needing to make double restitution for any specific losses related to that data loss. Loss of each of my data points (SSI, age, hair color, mother's maiden name, etc.) would incur a separate fine. How about $5 for each data point lost?

If someone uses my mislaid data to obtain funds, I should be compensated well beyond the cash value of that loss - to repair the financial damage, to account for the time and trouble involved in repairing my data security and personal credit, to reestablish my faith in the party that lost my data, and as a penalty against the errant party.

I should be able to obtain monetary compensation if my lost data is used in a way that causes me non-monetary damages - loss of privacy, poor credit rating, travel difficulties, unlawful or mistaken detention.

These monetary penalties should be in addition to any other legal recourses I might have presently.

When governments and companies understand that there are tangible penalties and repercussions for mishandling your data they will be far more judicious in how much data they will require, how they handle it, how long they keep it, and how well they secure it.

Govt SkepticJanuary 31, 2008 9:47 AM

Why not bond/insure the storage of personal/transaction data. Maybe exempt non-specific group data that can't be traced back to individuals (e.g., 4% of all users who viewed this advert clicked thru).
That way, the entities that do the storing (and thus the leaking) would have serious financial consequences -- and the insurance industry could even get behind it as a new revenue stream. Today, no one wins with personal data archiving, but if there's money to be made off insuring/securing it, then everyone wins!

Jack C LiptonJanuary 31, 2008 9:50 AM

So, when data leaks out of the pipe, etc, and acts as a pollutant, is it a form of entropy?

After all, there _is_ a difference between data and information.

paulFebruary 1, 2008 10:39 AM

The externalities thing is the issue. Right now, it's like toxic waste in a country that doesn't have any laws requiring polluters to clean up after themselves. Or like a particularly strange isotope of plutonium that kills anybody except the people who "accidentally" release it.

Ronald van den HeetkampFebruary 2, 2008 12:58 PM

Well, if you read about the latest data breach at Davidson companies:

where they stole 226.000 user accounts, I questioned myself: why didn't they encrypt the sensitive data? and store the keys below in the root folder. Or at least have different (monitored) databases that references data through tokens that are meaningless if stolen.

Could be me, how sad it is: this stuff isn't hardly news anymore.

technologistFebruary 4, 2008 2:20 PM

I find it more than a little ironic to see Mr. Doctorow's concern about protecting personal electronic data given his disputes with the SFWA and support for free electronic distribution of copyrighted works.

One thing we need is something akin to automatic aging of this personal information in venues outside the owner's control. The problem is essentially an inverse of the patent problem -- there are legitimate purposes that benefit the consumer in the long run for retaining some of this information but after some period of time we want that information out of the public domain (as opposed to patents which become public domain after they expire).

It also strikes me there is a corollary here with the arguments by Second Amendment activitists who want to prevent unconstitutional weapons seizures by not allowing creation of national (or even state) databases.

I'm not really sure there is a satisfactory solution to the dichotomy between having the freedom to collect & process information and the right to privacy except with respect to government actions. I think the right to privacy should also expire some period after the owner's death (like copyrights).

Windi JolbarsFebruary 8, 2008 6:00 AM

As for the data pollution topic... recent leakage of 17 gigabytes of private photos from proved that. You showed something for your girlfriend eyes only, and now everyone could see that.

Actually, I think it's a good lesson for all internet users. The electronic information tends to be stored somewhere, and could be accessible in years to come.

