Teaching Computers How to Forget

I've written about the death of ephemeral conversation, the rise of wholesale surveillance, and the electronic audit trail that now follows us through life. Viktor Mayer-Schönberger, a professor in Harvard's JFK School of Government, has noticed this too, and believes that computers need to forget.

Why would we want our machines to "forget"? Mayer-Schönberger suggests that we are creating a Benthamist panopticon by archiving so many bits of knowledge for so long. The accumulated weight of stored Google searches, thousands of family photographs, millions of books, credit bureau information, air travel reservations, massive government databases, archived e-mail, etc., can actually be a detriment to speech and action, he argues.

"If whatever we do can be held against us years later, if all our impulsive comments are preserved, they can easily be combined into a composite picture of ourselves," he writes in the paper. "Afraid how our words and actions may be perceived years later and taken out of context, the lack of forgetting may prompt us to speak less freely and openly."

In other words, it threatens to make us all politicians.

In contrast to omnibus data protection legislation, Mayer-Schönberger proposes a combination of law and software to ensure that most data is "forgotten" by default. A law would decree that "those who create software that collects and stores data build into their code not only the ability to forget with time, but make such forgetting the default." Essentially, this means that all collected data is tagged with a new piece of metadata that defines when the information should expire.

In practice, this would mean that iTunes could only store buying data for a limited time, a time defined by law. Should customers explicitly want this time extended, that would be fine, but people must be given a choice. Even data created by users--digital pictures, for example--would be tagged by the cameras that create them to expire in a year or two; pictures that people want to keep could simply be given a date 10,000 years in the future.

Frank Pasquale also comments on the legal implications implicit in this issue. And Paul Ohm wrote a note titled "The Fourth Amendment Right to Delete":

For years the police have entered homes and offices, hauled away filing cabinets full of records, and searched them back at the police station for evidence. In Fourth Amendment terms, these actions are entry, seizure, and search, respectively, and usually require the police to obtain a warrant. Modern-day police can avoid some of these messy steps with the help of technology: They have tools that duplicate stored records and collect evidence of behavior, all from a distance and without the need for physical entry. These tools generate huge amounts of data that may be searched immediately or stored indefinitely for later analysis. Meanwhile, it is unclear whether the Fourth Amendment’s restrictions apply to these technologies: Are the acts of duplication and collection themselves seizure? Before the data are analyzed, has a search occurred?

EDITED TO ADD (6/14): Interesting presentation earlier this year by Dr. Radia Perlman that represents some work toward this problem. And a counterpoint.

Posted on May 16, 2007 at 6:19 AM • 33 Comments

Comments

kybMay 16, 2007 7:09 AM

Personal data should belong to the individual that it's about, not to whatever organisation has compiled it.

People want to be able to store everything about themselves, but it should be theirs to determine what parts of it get shared with who.

This is one reason OpenId is good, it lets the user take complete control of authentication, and share that appropriately.

DwangoMay 16, 2007 7:15 AM

If copyright law can stipulate that I may not copy information even though doing so does not actually deprive the copyright owner of anything tangible (i.e., even though it's not theft), then certainly, copying or acquisition of data by the police should be treated as a seizure as well: either copying is a significant act, or it is not.

That being said, I'm not convinced that this is *always* a good idea. In the example of cameras, for example, would the author also say that "traditional" photos taken on film should be required to self-destruct after a year or two? Of course not. I can understand and agree that when someone uploads digital photos to a service such as Flickr, they should get an expiry date unless the user says otherwise, but the camera itself should not enforce this.

Toby StevensMay 16, 2007 7:18 AM

"If whatever we do can be held against us years later... the lack of forgetting may prompt us to speak less freely and openly."

This is possibly the most persuasive argument I've yet heard against that old lie "if you have nothing to hide, then you have nothing to fear". After all, who wants to end up as a politician?

clive RobinsonMay 16, 2007 7:47 AM

Once upon a time the law required you to keep certain information for atleast seven years.

Why must we keep anything longer than the law requires?

The answer these days appears to be,

"because we can"

Am I the only person who finds this a strange reason for doing something that entails a great deal of expense?

Victor WagnerMay 16, 2007 7:55 AM

This idea have one huge implication - future historicans and archeologists would be unable to tell anything about our times. It would lead to some Orwellian society where history can be easily rewritten according current political needs.

Nobody would be able to disapprove some mass-media or goverment inspired legends about past, because every bit of evidence would be expired and forgotten.

merkelcellcancerMay 16, 2007 8:05 AM

I am hoping, like in AI (the movie), that when we are a cold and frozen world that some future explores will discover all this data and wonder what all the fuss was about.

SecureMay 16, 2007 8:36 AM

Wouldn't this require a system much more invasive than TCPA? It must even forbid to load backups from a burned DVD after expiration...

Some GuyMay 16, 2007 8:43 AM

Can we get this to work for wives as well? My point is, old actions and remarks can already be held against you for quite a long time. I could see the same arguments being used against writing or the printing press. Perhaps all we need is a way to drop off the record when we're doing something unflattering.

JoshuaMay 16, 2007 10:08 AM

So, basically, what we need to do is replace computers with artificial intelligences based on Alberto Gonzales?

I agree in principle with this when applied to the massive data brokers, but cameras? Don't be ridiculous.

EliMay 16, 2007 10:22 AM

"Personal data should belong to the individual that it's about, not to whatever organisation has compiled it."

At what point do we wind up talking about the ownership of facts? If Alice can't tell Bob anything about Charlie, even though it's true, are we really better off?

Andy MastersMay 16, 2007 10:31 AM

An interesting way of doing this would be to encrypt each file with it's own key.

The computer would keep a hold of the key but as time passes, parts of the key could be deliberately dropped/forgotten.
This would make the file recoverable if really needed (brute force the key), but expensive (computationally and financially) - exactly how hard this would be dependant upon the class/age of the file.

paulMay 16, 2007 10:41 AM

It's not just the data, it's the search technology. When everything that was nominally in the public record took days, weeks or even years to track down, people mostly didn't have to worry about it getting bruited about. Now you can find every stupid thing I said in 1993 at the touch of a few buttons. (And plenty of people, even back then, stopped participating in public online discussions out of concern for immediate or future consequences.)

On the other hand, it would be sad to lose all of this stuff entirely. Maybe the digital world should be rethought to be subject to the same kinds of random deletions and obscurations as the physical one. It might be nice for the historians if data started going away after a few years and then came back 30 or 50 or 100 years later, when it could be useful without damaging individuals too badly. (Of course, nothing would be able to read it then, but that's another problem.)

I wonder if some of the distributed crypto schemes where you need m out of n keys to recover a piece of data could be adapted for something like this, perhaps in an archive where the "keys" simply wouldn't be available until certain dates.

HermanMay 16, 2007 11:09 AM

??Off-topic??? (Not so sure)

http://news.search.yahoo.com/search/news/?...

Baby Issued Firearm Owner's ID Card

Chicago, Absurdistan: Bubba Ludwig can't walk, talk or open the refrigerator door -- but he has his very own Illinois gun permit. The 10-month-old, whose given name is Howard David Ludwig, was issued a firearm owner's identification card after his father, Howard Ludwig, paid the US$5 fee and filled out the application, not expecting to actually receive one.

BennyMay 16, 2007 11:24 AM

@ Andy

I don't think that proposal will work, because as technology advances, it becomes easier to recover previously encrypted data without the key.

Craig HughesMay 16, 2007 11:25 AM

Wouldn't the same law mean that your iPod had to "forget" your music after some time period, forcing you to buy it all over again? And if you can have a "Do not forget this" setting on your iPod where the user can over-ride forgetfulness in favor of usefulness, then why would every database out there not have an EULA on it where the user waived forgetfulness in favor of usefulness? Retaining data is undeniably useful, even if some of the side effects are potentially dangerous. Even when the dangers are explained, the benefits on balance will likely almost always outweigh the dangers (especially ante-facto), so people will get used to clicking the "do not forget" button and do it all the time.

MikeAMay 16, 2007 11:29 AM

@Victor Wagner
This idea have one huge implication - future historicans and archeologists would be unable to tell anything about our times.

This is already starting to happen. Try reading a Pagemaker 3 file some time, or older PDF, Word, etc. You not only need the exact application that created it, you need the OS it ran on, the hardware the OS ran on, etc. And you need the _actual_ stuff, because the specs are false, when they aren't just missing. Between undocumented formats, DRM, and the Disney "perpetual copyright", the period from 1921 to whenever we come to our senses is going to be a black hole for future historians.

CanticleMay 16, 2007 11:46 AM

This isn't a new problem nor is it a technology problem... I don't even think its a security problem.
Galileo had his youthful writings brought before the church as evidence to his wrong-thinking during middle-age.
What technology brings to the table is a certain ease of information retrieval for technically compatible information (as several earlier writers noted, the winds of technology make obsolete earlier methods of storage retrieval). Imagine if we changed alphabets every 10 years and forgot the old when learning the new...
So, while current technology simplifies retrieval, it also guarantees a short memory (at least relative to pen and paper).
Like most security issues, this is a people issue. Not a question of security but of perceived trust.

shimmershadeMay 16, 2007 11:57 AM

Congress and state legislatures have treated classes of personal data differently, thus we have laws specific to medical data, tax return data, banking data, debt collection data, and so on. The use of data-forgetting technology could be required by law in those areas where potential for harm is greatest.

Pass A BroadLaw!™May 16, 2007 11:58 AM

"...Mayer-Schönberger proposes a combination of law..."

Well, if it has to do with technology and privacy, then legislation MUST be the only answer, right?!

Somebody proposes a law? Well then I'm all for it! Let's get behind it!

Brian CarnellMay 16, 2007 1:21 PM

"Even data created by users--digital pictures, for example--would be tagged by the cameras that create them to expire in a year or two; pictures that people want to keep could simply be given a date 10,000 years in the future."

LOL...I can just imagine all of the wonderful tech support calls camera makers would get 24 months after this sort of requirement went into effect.

This is an absurd solution to a very real problem (and to some extent, this is where someone like David Brin is right in that we'll just have to get used to this to a very large extent given how futile existing technological efforts to prevent the maintaining of unauthorized data have been).

Kee HinckleyMay 16, 2007 6:17 PM

The problem of persistence-of-conversation has been with us from the early days of Usenet, but most people have ignored it. What we have now is basically privacy by obscurity, and the obscurity is rapidly going away.

Unfortunately or not, the solution is not to make computers forget. There will always be someone who hangs on to the data. And of course the data trails can be as damning (usually more-so :-^ ) than the data itself.

The solution is not technical, it's social. As speakers, we need to relearn how to apologize. As listeners, we need to accept that people say dumb things, that context is not always there, and that most of all, it's okay for people to change their minds. (One of the the things I find particularly frightening about today's political discourse is the number of politicians who are unwilling to change their minds in the face of changed facts. That's not just a personal trait, it's a political one, because a large number of voters seem to view change as weakness.)

I think largely because my name is a unique identifier, I've never believed that anything I've posted online (and I've been doing so for twenty-five years) was either anonymous or transient. Yes, it makes me think more carefully about what I write--but that's not a bad thing. More importantly, it has ensured that I'm prepared to promptly issue public retractions when I say something stupid--which has certainly happened on occasion.

I think our energy would be better spent on encouraging people to take responsibility for what they do say (and have said) than on making computers forget it.

Jay CarlsonMay 16, 2007 9:45 PM

This is actually a collision of two different technologies. USENET was published on many CD volumes for a while; that was a little troublesome back then. Now that we all live on Gibson's West Coast from Count Zero:

====
"What is this? I mean, if you could sort of explain.." He still couldn't move. The "window" showed a blue-gray video view of palm trees and old buildings. "How do you mean?" "This sort of drawing. And you. And that old picture. . "Hey, man, I paid a designer an arm and a leg to punch this up for me. This is my space, my construct. This is L.A., boy. People here don't do anything without jacking. This is where I entertain! "Oh," Bobby said, still baffled.
====

MySpace indeed.

The classic SF extension of this is perfect remote viewing some distance into the past.

The mess only happens when combined with search that doesn't suck, which is relatively novel. This allows for *industrialization*, and we all know the standard crypto narrative of what that did.

The next couple election cycles in the US are going to be the preview of, well, unrestricted warfare in this domain. If there will be any kind of regulation, it will be similar to that passed to protect video rental records after politicians were embarrassed.

Bob W.May 17, 2007 12:32 PM

@Clive Robinson:

Going through stored records and purging those no longer required by law is also a significant expense. I've known some consultants who specialized in this area, developing expertise in classes and types of records and the relevant statutes mandating sometimes quite different retention times.

In the old way of doing business, an individual could go through the boxes of records that businesses kept more or less classified by area of application (hand-written ledgers vs. invoices vs. sales orders vs. payment records, etc.) verifying that the contents of the boxes matched the labels, and making retention decisions based on (mostly) tax-related retention mandates.

With increasing amounts of information being stored in digital formats and subject to increasing numbers of government mandates for storage, the task of selecting what to purge and what to retain has probably grown beyond levels individual consultants could handle with that manual approach.

If nothing else, the fact that backup media are typically written with a mix of multiple record types and classes makes purging the most fundamental of storage media practically very difficult, if not impossible.

There's an OASIS standardization group working on XML DTDs for business documents, and some government agencies are participating in the process to try to ensure that there's provision for appropriate taxation-related data objects being correctly tagged. Maybe some of the OASIS output will be useful in the area or record retention and purging, but I doubt that there's any digital record standardization activity in this particular area.

Purging digital records without purpose-specific tags which can relate them to retention mandates is going to be an approximate process at best. The security trade-off may be between exposure to litigation due to failing to retain records as mandated and failing to expunge records as required to avoid exposure to civil suits.

A simple sunsetting of records based on an arbitrary decision by individual operators is likely to cause exposure to both.

False DataMay 17, 2007 1:56 PM

Data retention's impact on free speech is a real concern, but I wonder if implementing Professor Mayer-Schönberger's solution might be more complicated than he gives it credit for being.

For example, he writes "Even data created by users--digital pictures, for example--would be tagged by the cameras that create them to expire in a year or two; pictures that people want to keep could simply be given a date 10,000 years in the future." He also writes about keychain devices you could wear that could tell surveillance cameras to expire pictures of you in the near term.

So, what happens if I use my personal digital camera, take a picture of you, you're wearing a keychain device, and I set the expiration to 10,000 years then post the picture on the web? Lots of cell phone cameras, some photo blogs, and decent image search technology can do what surveillance cameras do, so whose rights win out--yours or mine?

It gets a bit worse. Suppose we say you win because it promotes the policy of communication via forgetfulness, which we decide is more valuable than my property right in the picture. How do you enforce that right if I decide to break the deletion protection? Do you sue me directly? If you have to resort to doing that, after marching in, say, an anti-war demonstration, you could quickly find yourself overwhelmed trying to enforce your rights against every image of you that all those camera wielding people have taken. Do we criminalize holding the data too long? Do we use a DMCA-style approach where you can make a demand on the ISP to remove the picture? Or do we use a technological fix, where we require the camera to encrypt any picture of me using my private key and a unique session key (which might make stadium shots from the Goodyear Blimp an interesting computational problem)?

Stephan EngbergMay 18, 2007 1:08 AM

Revokability of data

The article is on the spot, but obviously not helping in suggesting how to achieve this. Data as such have eternal life and gravity - they pull together because of asymmetry of interests and risk assessments.

Data do not get deleted as the cost of storage is close to zero and the risk of deleting something you MIGHT need is bigger than this cost. And data gets used because they can be used - existence creates possibility and possibility creates purpose.


The essence of the problems is to work with REVOKABILITY of DATA.

Intuitive thinking seems to point towards DRM -encrypting data itself. I think this is a blind alley that will never lead to solutions but only to "un"trusted computing where controls are elsewhere than with the citizen.

Instead we need to focus on Revocability of the associated identity and all relevant keys and identifiers.

Data themselves are perhaps permanent, but the associated risk to the person is controllable and can be changed AFTER data have been disclosed. This require identity to be purpose-specific and in principled created bottom-up for every purpose.

The wrong way to make this happen is the "trusted" gatekeeper - MS Passport Style as this build power and control over people.

But often we also need accountability and emergency to work even if citizens have revokability and control.

This is the essense of what I call National Id 2.0 - where identity turns into something securing citizens by empowering them with means to control their Identities instead of only building controls over people with single id and upfront identification.

Simone De BouvierMay 18, 2007 2:44 AM

Data volume in itself has advantages that would be negated by the addition of expiry or retension date.

mikeMay 20, 2007 6:09 PM

on photos of people; i have always thought that photos of you should be your property. this would happily eliminate papparazi :)

it does cause a problem when photographing multiple people but the answer is simple; if someone doesn't want to be in the photo then the entire photo is scrapped. tough, but fair, i believe. after all, if YOU were holding a photo of yourself, and someone stole it, you could request it back. what's the practical difference between that and someone taking the shot themselves? none.

ChadMay 21, 2007 8:28 AM

I think the complete deletion of data is a huge problem these days. With some many thief's out to scour every little piece of info away from you, a solution to having your computer forget is required. I am also not liking how liberal other companies are about collecting data on me. Like google collecting my search history, and god knows what else. Google is going to have access to your email soon. Check out http://www.chadmauldin.com/blog/2007/04/24/...

Leave a comment

Allowed HTML: <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre>

Photo of Bruce Schneier by Per Ervland.

Schneier on Security is a personal website. Opinions expressed are not necessarily those of Co3 Systems, Inc..