Privacy and Security of Data at Universities

Interesting paper: "Open Data, Grey Data, and Stewardship: Universities at the Privacy Frontier," by Christine Borgman:

Abstract: As universities recognize the inherent value in the data they collect and hold, they encounter unforeseen challenges in stewarding those data in ways that balance accountability, transparency, and protection of privacy, academic freedom, and intellectual property. Two parallel developments in academic data collection are converging: (1) open access requirements, whereby researchers must provide access to their data as a condition of obtaining grant funding or publishing results in journals; and (2) the vast accumulation of "grey data" about individuals in their daily activities of research, teaching, learning, services, and administration. The boundaries between research and grey data are blurring, making it more difficult to assess the risks and responsibilities associated with any data collection. Many sets of data, both research and grey, fall outside privacy regulations such as HIPAA, FERPA, and PII. Universities are exploiting these data for research, learning analytics, faculty evaluation, strategic decisions, and other sensitive matters. Commercial entities are besieging universities with requests for access to data or for partnerships to mine them. The privacy frontier facing research universities spans open access practices, uses and misuses of data, public records requests, cyber risk, and curating data for privacy protection. This Article explores the competing values inherent in data stewardship and makes recommendations for practice by drawing on the pioneering work of the University of California in privacy and information security, data governance, and cyber risk.

Posted on November 9, 2018 at 6:04 AM • 6 Comments


PhaeteNovember 9, 2018 8:27 AM

Universities are exploiting these data for research, learning analytics, faculty evaluation, strategic decisions, and other sensitive matters. Commercial entities are besieging universities with requests for access to data or for partnerships to mine them.
Um, so
The exploiting universities are besieged by commercial entities.

wait, i think i recognize this verbiage, let me try..

"The vile venomous bias was making hissing sounds while it creepily slithered down the pages of the article to defile the marred floor."

This writer sure wants you to think (conclude) a certain way and is biasing the words to get you there, which makes any part of the paper inherently untrustworthy.

D-503November 9, 2018 4:09 PM

Those are the facts, as reported in the article's abstract. Whether those facts are adequately backed up by data is another question (I haven't read the article yet). If there's any bias, it would mostly be in the author's interpretation of the data. That's how science, mathematics, and scholarship work (or fail, as the case may be).
If you don't like the facts, you're free to make up alternative facts. But in that case, it's best you have evidence to back up those alternative facts.
If universities are using detailed private information about students and staff, that's something I want to know. If universities are in some way selling that private info, that's also something I want to know.
If you think no one should even be thinking about these questions, state your rationale explicity instead of slinging mud.
(This message routed through a university server and presumably harvested for who knows what purpose)

Petre Peter November 9, 2018 4:33 PM

I wanted to create an application application for a university but I still haven't received a copy of their Privacy Policy.

NameNovember 10, 2018 5:27 AM

There is a lot more conflict of interest in 'the competing values inherent in data stewardship' than just the security perspective. Admittedly this is off-topic here but some of those other issues don't lend themselves very well to a honest public debate, so I'll write them here where nobody minds the writer hiding behind anonymity...

As the article says, releasing your research data is an increasingly common requirement for getting grants and publishing paper. But pushing it too far can backfire, similarly to other supposedly well-meaning initiatives like the Vancouver Recommendation (a one size fits all monstrosity from the medical and pharma research where the conflicts of interest are immensely greater then almost any other field). Here are two real examples from my field, somewhere in the natural/physical sciences:

1) As a scientist your success depends not only on the impact of your publications. It also depends substantially on how impressive your publications list appears to the random bureaucrats scoring your 'academic productivity'. The two things are much less related than outsiders may think. I worked 3 years for then publishing one single monumental paper whose publicly-released dataset 10 years later is still being used by virtually all papers in my field. Data are in a sane format and well documented so nobody needs to bring me in as a coauthor just to make any sense of my gigabytes of data. And since it was complete and exaustive, I didn't need to publish countless followups, further developments and other ways of padding my publications list while wasting my competitors/readers time with a forever moving 'half-baked state of the art'. That was a very high impact paper, lots of citations. But still one paper in three years and just a handful of co-authorships in the following 10 years when people genuinely needed my contribution. In most countries, I'd have seriously risked my academic job for under-performance. A somewhat complementary paper and dataset came out around the same time as mine, so the two datasets are often used together. That other dataset, also publicly released, is I'd say a masterpiece in data obfuscation. Your chances are almost nil of using it properly without inviting its original author to join your paper. And that is not for any intrinsic complexity: similar datasets are routinely used even for undergrad lab assignments with no problems. Of course this trick earned the guy a very impressive list of coauthorships in top journals for very little work - well at least after the work it must have gone into obfuscating his 'public' data in the first place...

2) nowadays the bulk of my own research funding comes with an obligation to release the entirety of my data as soon after collection in the field as practical. This puts me in a difficult spot: producing said data takes up more than half of my and my group's time (actual fieldwork, data quality control, designing, building, testing and maintaining the field instrumentation, logistics planning and contracting, and so on). If we release it all immediately, there is a near 100% chance that one or another of the high flying colleagues who use all of their time analyzing 'public' data will outcompete us and publish first on our own data. On the next grant I'm still evaluated on my publications output so the negative incentive to collecting data under these conditions is immense. So what do we do? First, we pretend that quality control takes a lot more than it really does. And secondly - well I learned from the more senior and acclaimed scientist mentioned above so when we are careful to express some of the key variables in the most esoteric and idiosyncratic way, here and there even leaving them in engineering units like millivolts or pulses per second which are worthless without the sensor calibration factors... If anybody wants to use them, they need to involve some of us in their papers. Or wait for our own papers, which come with sanely formatted data (so they get cited) and in countless 'masters project'-sized bites (so we get to publish a lot of them). Honest? No. Efficient in our and our readers' time? No. In line with the Vancouver criteria and stuff? No, and to hell with them anyways. But it allows for the survival of the relatively few who do actually go out and collect unique data in the field at great expense of time. It only really pains me when every new young colleague comes and candidly offers to improve our output formats to make them easier for anybody to use... Then I have to keep a straight face while hand-waving about consistency with legacy formats, following some arcane forgotten standard and other bs. At some point they decide I'm an old fool, give up, and go back co-authoring and and first-authoring on salaries I couldn't give them otherwise.

Apologies for the off topic, and happy data releasing!

AlexNovember 13, 2018 12:32 AM

Sadly, this is nothing new. Some 20 years ago I was in a university hospital as a patient. Upon admission, I explicitly refused to be part of any Public Relations campaigns, research campaigns, or any other similar disclosures of my medical information. About a year or so later I see a published paper by one of the Drs. Looking over the tables, I spotted myself. Sure, they didn't publish my name, but anyone familiar with my medical history would recognize me in that data without a problem. It certainly explained why they ran all sorts of extra/unnecessary tests when I was there.

The Dr was unsympathetic. The university plead ignorance.

Leave a comment

Allowed HTML: <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre>

Photo of Bruce Schneier by Per Ervland.

Schneier on Security is a personal website. Opinions expressed are not necessarily those of IBM Resilient.