Teaching Computers How to Forget
I’ve written about the death of ephemeral conversation, the rise of wholesale surveillance, and the electronic audit trail that now follows us through life. Viktor Mayer-Schönberger, a professor in Harvard’s JFK School of Government, has noticed this too, and believes that computers need to forget.
Why would we want our machines to “forget”? Mayer-Schönberger suggests that we are creating a Benthamist panopticon by archiving so many bits of knowledge for so long. The accumulated weight of stored Google searches, thousands of family photographs, millions of books, credit bureau information, air travel reservations, massive government databases, archived e-mail, etc., can actually be a detriment to speech and action, he argues.
“If whatever we do can be held against us years later, if all our impulsive comments are preserved, they can easily be combined into a composite picture of ourselves,” he writes in the paper. “Afraid how our words and actions may be perceived years later and taken out of context, the lack of forgetting may prompt us to speak less freely and openly.”
In other words, it threatens to make us all politicians.
In contrast to omnibus data protection legislation, Mayer-Schönberger proposes a combination of law and software to ensure that most data is “forgotten” by default. A law would decree that “those who create software that collects and stores data build into their code not only the ability to forget with time, but make such forgetting the default.” Essentially, this means that all collected data is tagged with a new piece of metadata that defines when the information should expire.
In practice, this would mean that iTunes could only store buying data for a limited time, a time defined by law. Should customers explicitly want this time extended, that would be fine, but people must be given a choice. Even data created by users—digital pictures, for example—would be tagged by the cameras that create them to expire in a year or two; pictures that people want to keep could simply be given a date 10,000 years in the future.
Frank Pasquale also comments on the legal implications implicit in this issue. And Paul Ohm wrote a note titled “The Fourth Amendment Right to Delete”:
For years the police have entered homes and offices, hauled away filing cabinets full of records, and searched them back at the police station for evidence. In Fourth Amendment terms, these actions are entry, seizure, and search, respectively, and usually require the police to obtain a warrant. Modern-day police can avoid some of these messy steps with the help of technology: They have tools that duplicate stored records and collect evidence of behavior, all from a distance and without the need for physical entry. These tools generate huge amounts of data that may be searched immediately or stored indefinitely for later analysis. Meanwhile, it is unclear whether the Fourth Amendment’s restrictions apply to these technologies: Are the acts of duplication and collection themselves seizure? Before the data are analyzed, has a search occurred?
EDITED TO ADD (6/14): Interesting presentation earlier this year by Dr. Radia Perlman that represents some work toward this problem. And a counterpoint.
kyb • May 16, 2007 7:09 AM
Personal data should belong to the individual that it’s about, not to whatever organisation has compiled it.
People want to be able to store everything about themselves, but it should be theirs to determine what parts of it get shared with who.
This is one reason OpenId is good, it lets the user take complete control of authentication, and share that appropriately.