Risks of Data Portability

Peter Swire and Yianni Lagos have pre-published a law journal article on the risks of data portability. It specifically addresses an EU data protection regulation, but the security discussion is more general.

…Article 18 poses serious risks to a long-established E.U. fundamental right of data protection, the right to security of a person’s data. Previous access requests by individuals were limited in scope and format. By contrast, when an individual’s lifetime of data must be exported ‘without hindrance,’ then one moment of identity fraud can turn into a lifetime breach of personal data.

They have a point. If you’re going to allow users to download all of their data with one command, you might want to double- and triple-check that command. Otherwise it’s going to become an attack vector for identity theft and other malfeasance.

Tags: academic papers, data protection, fraud, identity theft, laws

Posted on October 24, 2012 at 1:27 PM • 10 Comments

Comments

Petréa Mitchell • October 24, 2012 6:06 PM

If it were a simple command, you’d want to use a different means or extra layer of authentication for a couple reasons– there’s stopping identity theft, and then there’s the problem that dialogs asking “Did you really mean to do that?” are useless. (Not just an opinion, something that’s been found in usability research.)

But then again, I don’t see providers making this a simple, easy-to-access command. I’d expect it to be buried behind three layers of inscrutable menus.

stark • October 24, 2012 9:03 PM

I disagree that double-checking and triple-checking does any good. Are you sure you’re sure? Once a person decides, any further questions of assurance are redundant and a nuisance. The person doesn’t really think twice, since they’re sure they already made the correct decision. The real problem is that the data is downloaded like that at all. Maybe you could download a randomly generated password which then has to be produced at a second physical location.

Spaceman Spiff • October 24, 2012 10:41 PM

There is also the need to properly encrypt the data so only the intended recipient can access it once it is on another device. We are pondering all the ramifications of the EU data protection laws as they relate to our customers’ data that we store and analyze in order to provide a better user experience with our devices in common use. This is not simple, and done improperly can leave one, and one’s company, liable for some very big penalties, at the very least. We have initially determined that all data transferred from our data centers must be encrypted end-to-end, and all storage of that data be likewise encrypted, even to the level of the file system where the data is stored. We are still pondering how to keep someone who has somehow managed to access one of our servers or data clusters from being able to access/view the data, so there may be an additional database encryption level implemented. Unfortunately, not all providers of database or big-data products have reached that level of security…

Winter • October 25, 2012 4:18 AM

@stark
The point is not double checking with the applicant, but multi-part identification. And maybe confirmation by a different channel with a mandatory waiting time.

Clive Robinson • October 25, 2012 5:30 AM

It’s an awkward problem that has two parts the first being the protection of the data it’s self the second identifing if the person requesting the data is a “fit and proper entity” within the requirments of ALL EU legislation not just that which appears to be data protection related nor the “generic” EU legislation but all the interpretations of the individual nations as they put the directives into their jurisdictions legislative framework (for instance German legislation is way way different to UK legislation).

Login passwords/phrases and warning dialog boxes appear to be way short of what is actually required, the problem is determining what is required, some people reading it have indicated that it needs to be two way two factor authentication on both the request and the response.

That is the request alone can reveal PII just when checking that the entity is making the right request. For instance a user logs into a system using UID and Pword neither of which need involve PII. If for some reason the persons sesion becomes available to others (MITM etc etc), a dialog box that says “Fred Smith do you want to download all your details” has revealed PII.

There are other issues such as that involving PK Certs, these may contain PII etc or information that can be used to illicit PII by indirect means.

I guess it’s going to need a couple of test cases in each juresdiction to settle the scope of what is intended in the juresdiction by their legislative version of the directive.

Oh and it’s big US companies and US legislation that has brought about this. In the US as far as the legislation is concerned broadly the data belongs to who ever collects it, where as in Europe we have what some major US companies regard as a “quaint notion” that the data actually belongs to the data subject… Various US companies have tried to get around previous “safe harbour” arangments by having some part of their organisation outside of the EU and US. However even within the EU there are differences on for instance who owns a residential address? some regard it as not PII others regard it as part of PII because it can be used to identify a person or small group of related persons. Likewise what about Social/National Security/Benifit numbers or any other details direct or otherwise that can be used as Primary or parts of Secondary keys?

I don’t own either my First or Last names and they are shared by many many people. Even in direct combination there are at least five other people in the UK that share them that I am directly aware of, so the whole thing is a bit of a dogs breakfast before it gets going.

In the UK there is quite a debate about anonymizing data sets. The UK Gov and many non ICT researchers do not have a clue nor do they wish to over making for instance medical records anonymous. The UK Gov regards the records as it’s property by which it can raise revenue, medical researchers regard access to the data in full other than a persons name and house and phone number as being absolute requirments for their research. Likewise Medical Insurance companies believe not only unrestricted access to all medical records but bank records and any other records they can get their hands on… There have been a number of briefings to the Politicos but they don’t appear to care, if you have a look at the UK’s Cambridge labs web site you will find numerous papers and briefing and evidence documents ( http://www.lightbluetouchpaper.org ).

Peter Swire • October 25, 2012 6:42 AM

I agree with the idea of a separate layer of authentication, preferably in a different channel, before a person’s data is deleted or exported. Like the limit in banks on my daily transfer through online banking. We plan to include that point in the next version of the paper.

Winter • October 25, 2012 6:56 AM

@Clive Robinson
“In the UK there is quite a debate about anonymizing data sets.”

But anonymizing data sets is very difficult. There was a revealing PhD thesis by Matthijs Koot about this subject a few months ago: “Measuring and Predicting Anonymity”

http://blog.cyberwar.nl/2012/05/measuring-and-predicting-anonymity-phd.html

Clive Robinson • October 25, 2012 8:02 AM

@ Winter,

But anonymizing data sets is very difficult.

Yes, some would say it’s impossible and still have the data set usable, and that’s part of what the debate is about.

Another part is the fact that those who want to make money out of the PII/medical data deny or ignore quite deliberatly and forcfully the issues that arise from the lack of anonymity…

Pad • October 25, 2012 8:18 AM

Let’s not mixup responsibilities here : on the one hand the law has to ensure that anyone can access all his personal data kept by any structure, on the other hand it is up to the structures which keep personnal data to make sure that this data cannot be hacked in any way.
Who should get the blame if personal data is wrongfully obtained from a structure ? The law which mandated that the data should be available to its subject, or the structure which failed to verify that the subject was the right one ?

Jon • October 25, 2012 4:04 PM

Why does the [i]delivery[/i] have to be via the internet (or any other digital network)?

Why not mail the dataset out, on a CD on el-cheapo pen drive? Or have the requestor come in to a post office or police station or pharmacy or clinic to pick it up in person?

Apply for it on line, and some time later the physical good arrives. Yes, it’s still possible to impersonate a person and obtain their personal data, but it becomes MUCH harder to engineer a fraudulent mass-data-release.

Throw some caveats in about what kinds of addresses the data will be mailed to, checking of whether that physical address has received or requested any other person’s data (with some useful heuristics and mechanisms to cope with families, shared households, and changes of address), and checking when the last time that particular person’s data was requested (“hmm, this person’s data has been requested three times in the last 24 hours. WTF?”) and you’d be a long way towards having a useful system, I’d think.

It’s the perceived convienience and “need” to deliver the goodies over a mechanism designed to deliver in bulk that seems the main risk factor, to me.

Jon

Risks of Data Portability

Comments

Leave a comment Cancel reply