Entries Tagged "databases"

Page 2 of 14

New Data Privacy Regulations

When Mark Zuckerberg testified before both the House and the Senate last month, it became immediately obvious that few US lawmakers had any appetite to regulate the pervasive surveillance taking place on the Internet.

Right now, the only way we can force these companies to take our privacy more seriously is through the market. But the market is broken. First, none of us do business directly with these data brokers. Equifax might have lost my personal data in 2017, but I can’t fire them because I’m not their customer or even their user. I could complain to the companies I do business with who sell my data to Equifax, but I don’t know who they are. Markets require voluntary exchange to work properly. If consumers don’t even know where these data brokers are getting their data from and what they’re doing with it, they can’t make intelligent buying choices.

This is starting to change, thanks to a new law in Vermont and another in Europe. And more legislation is coming.

Vermont first. At the moment, we don’t know how many data brokers collect data on Americans. Credible estimates range from 2,500 to 4,000 different companies. Last week, Vermont passed a law that will change that.

The law does several things to improve the security of Vermonters’ data, but several provisions matter to all of us. First, the law requires data brokers that trade in Vermonters’ data to register annually. And while there are many small local data brokers, the larger companies collect data nationally and even internationally. This will help us get a more accurate look at who’s in this business. The companies also have to disclose what opt-out options they offer, and how people can request to opt out. Again, this information is useful to all of us, regardless of the state we live in. And finally, the companies have to disclose the number of security breaches they’ve suffered each year, and how many individuals were affected.

Admittedly, the regulations imposed by the Vermont law are modest. Earlier drafts of the law included a provision requiring data brokers to disclose how many individuals’ data it has in its databases, what sorts of data it collects and where the data came from, but those were removed as the bill negotiated its way into law. A more comprehensive law would allow individuals to demand to exactly what information they have about them­—and maybe allow individuals to correct and even delete data. But it’s a start, and the first statewide law of its kind to be passed in the face of strong industry opposition.

Vermont isn’t the first to attempt this, though. On the other side of the country, Representative Norma Smith of Washington introduced a similar bill in both 2017 and 2018. It goes further, requiring disclosure of what kinds of data the broker collects. So far, the bill has stalled in the state’s legislature, but she believes it will have a much better chance of passing when she introduces it again in 2019. I am optimistic that this is a trend, and that many states will start passing bills forcing data brokers to be increasingly more transparent in their activities. And while their laws will be tailored to residents of those states, all of us will benefit from the information.

A 2018 California ballot initiative could help. Among its provisions, it gives consumers the right to demand exactly what information a data broker has about them. If it passes in November, once it takes effect, lots of Californians will take the list of data brokers from Vermont’s registration law and demand this information based on their own law. And again, all of us—regardless of the state we live in­—will benefit from the information.

We will also benefit from another, much more comprehensive, data privacy and security law from the European Union. The General Data Protection Regulation (GDPR) was passed in 2016 and took effect on 25 May. The details of the law are far too complex to explain here, but among other things, it mandates that personal data can only be collected and saved for specific purposes and only with the explicit consent of the user. We’ll learn who is collecting what and why, because companies that collect data are going to have to ask European users and customers for permission. And while this law only applies to EU citizens and people living in EU countries, the disclosure requirements will show all of us how these companies profit off our personal data.

It has already reaped benefits. Over the past couple of weeks, you’ve received many e-mails from companies that have you on their mailing lists. In the coming weeks and months, you’re going to see other companies disclose what they’re doing with your data. One early example is PayPal: in preparation for GDPR, it published a list of the over 600 companies it shares your personal data with. Expect a lot more like this.

Surveillance is the business model of the Internet. It’s not just the big companies like Facebook and Google watching everything we do online and selling advertising based on our behaviors; there’s also a large and largely unregulated industry of data brokers that collect, correlate and then sell intimate personal data about our behaviors. If we make the reasonable assumption that Congress is not going to regulate these companies, then we’re left with the market and consumer choice. The first step in that process is transparency. These new laws, and the ones that will follow, are slowly shining a light on this secretive industry.

This essay originally appeared in the Guardian.

Posted on June 8, 2018 at 6:48 AMView Comments

DNI Wants Research into Secure Multiparty Computation

The Intelligence Advanced Research Projects Activity (IARPA) is soliciting proposals for research projects in secure multiparty computation:

Specifically of interest is computing on data belonging to different—potentially mutually distrusting—parties, which are unwilling or unable (e.g., due to laws and regulations) to share this data with each other or with the underlying compute platform. Such computations may include oblivious verification mechanisms to prove the correctness and security of computation without revealing underlying data, sensitive computations, or both.

My guess is that this is to perform analysis using data obtained from different surveillance authorities.

Posted on July 7, 2017 at 6:20 AMView Comments

Indiana's Voter Registration Data Is Frighteningly Insecure

You can edit anyone’s information you want:

The question, boiled down, was haunting: Want to see how easy it would be to get into someone’s voter registration and make changes to it? The offer from Steve Klink—a Lafayette-based public consultant who works mainly with Indiana public school districts—was to use my voter registration record as a case study.

Only with my permission, of course.

“I will not require any information from you,” he texted. “Which is the problem.”

Turns out he didn’t need anything from me. He sent screenshots of every step along the way, as he navigated from the “Update My Voter Registration” tab at the Indiana Statewide Voter Registration System maintained since 2010 at www.indianavoters.com to the blank screen that cleared the way for changes to my name, address, age and more.

The only magic involved was my driver’s license number, one of two log-in options to make changes online. And that was contained in a copy of every county’s voter database, a public record already in the hands of political parties, campaigns, media and, according to Indiana open access laws, just about anyone who wants the beefy spreadsheet.

Posted on October 11, 2016 at 2:04 PMView Comments

Crowdsourcing a Database of Hotel Rooms

There’s an app that allows people to submit photographs of hotel rooms around the world into a centralized database. The idea is that photographs of victims of human trafficking are often taken in hotel rooms, and the database will help law enforcement find the traffickers.

I can’t speak to the efficacy of the database—in particular, the false positives—but it’s an interesting crowdsourced approach to the problem.

Posted on June 27, 2016 at 6:05 AMView Comments

White House Report on Big Data Discrimination

The White House has released a report on big-data discrimination. From the blog post:

Using case studies on credit lending, employment, higher education, and criminal justice, the report we are releasing today illustrates how big data techniques can be used to detect bias and prevent discrimination. It also demonstrates the risks involved, particularly how technologies can deliberately or inadvertently perpetuate, exacerbate, or mask discrimination.

The purpose of the report is not to offer remedies to the issues it raises, but rather to identify these issues and prompt conversation, research­—and action­—among technologists, academics, policy makers, and citizens, alike.

The report includes a number of recommendations for advancing work in this nascent field of data and ethics. These include investing in research, broadening and diversifying technical leadership, cross-training, and expanded literacy on data discrimination, bolstering accountability, and creating standards for use within both the government and the private sector. It also calls on computer and data science programs and professionals to promote fairness and opportunity as part of an overall commitment to the responsible and ethical use of data.

Posted on May 6, 2016 at 6:12 AMView Comments

Helen Nissenbaum on Regulating Data Collection and Use

NYU professor Helen Nissenbaum gave an excellent lecture at Brown University last month, where she rebutted those who think that we should not regulate data collection, only data use: something she calls “big data exceptionalism.” Basically, this is the idea that collecting the “haystack” isn’t the problem; it what is done with it that is. (I discuss this same topic in Data and Goliath, on pages 197-9.)

In her talk, she makes a very strong argument that the problem is one of domination. Contemporary political philosopher Philip Pettit has written extensively about a republican conception of liberty. He defines domination as the extent one person has the ability to interfere with the affairs of another.

Under this framework, the problem with wholesale data collection is not that it is used to curtail your freedom; the problem is that the collector has the power to curtail your freedom. Whether they use it or not, the fact that they have that power over us is itself a harm.

Posted on April 20, 2016 at 6:27 AMView Comments

Data Is a Toxic Asset

Thefts of personal information aren’t unusual. Every week, thieves break into networks and steal data about people, often tens of millions at a time. Most of the time it’s information that’s needed to commit fraud, as happened in 2015 to Experian and the IRS.

Sometimes it’s stolen for purposes of embarrassment or coercion, as in the 2015 cases of Ashley Madison and the US Office of Personnel Management. The latter exposed highly sensitive personal data that affects security of millions of government employees, probably to the Chinese. Always it’s personal information about us, information that we shared with the expectation that the recipients would keep it secret. And in every case, they did not.

The telecommunications company TalkTalk admitted that its data breach last year resulted in criminals using customer information to commit fraud. This was more bad news for a company that’s been hacked three times in the past 12 months, and has already seen some disastrous effects from losing customer data, including £60 million (about $83 million) in damages and over 100,000 customers. Its stock price took a pummeling as well.

People have been writing about 2015 as the year of data theft. I’m not sure if more personal records were stolen last year than in other recent years, but it certainly was a year for big stories about data thefts. I also think it was the year that industry started to realize that data is a toxic asset.

The phrase “big data” refers to the idea that large databases of seemingly random data about people are valuable. Retailers save our purchasing habits. Cell phone companies and app providers save our location information.

Telecommunications providers, social networks, and many other types of companies save information about who we talk to and share things with. Data brokers save everything about us they can get their hands on. This data is saved and analyzed, bought and sold, and used for marketing and other persuasive purposes.

And because the cost of saving all this data is so cheap, there’s no reason not to save as much as possible, and save it all forever. Figuring out what isn’t worth saving is hard. And because someday the companies might figure out how to turn the data into money, until recently there was absolutely no downside to saving everything. That changed this past year.

What all these data breaches are teaching us is that data is a toxic asset and saving it is dangerous.

Saving it is dangerous because it’s highly personal. Location data reveals where we live, where we work, and how we spend our time. If we all have a location tracker like a smartphone, correlating data reveals who we spend our time with­—including who we spend the night with.

Our Internet search data reveals what’s important to us, including our hopes, fears, desires and secrets. Communications data reveals who our intimates are, and what we talk about with them. I could go on. Our reading habits, or purchasing data, or data from sensors as diverse as cameras and fitness trackers: All of it can be intimate.

Saving it is dangerous because many people want it. Of course companies want it; that’s why they collect it in the first place. But governments want it, too. In the United States, the National Security Agency and FBI use secret deals, coercion, threats and legal compulsion to get at the data. Foreign governments just come in and steal it. When a company with personal data goes bankrupt, it’s one of the assets that gets sold.

Saving it is dangerous because it’s hard for companies to secure. For a lot of reasons, computer and network security is very difficult. Attackers have an inherent advantage over defenders, and a sufficiently skilled, funded and motivated attacker will always get in.

And saving it is dangerous because failing to secure it is damaging. It will reduce a company’s profits, reduce its market share, hurt its stock price, cause it public embarrassment, and­—in some cases—­result in expensive lawsuits and occasionally, criminal charges.

All this makes data a toxic asset, and it continues to be toxic as long as it sits in a company’s computers and networks. The data is vulnerable, and the company is vulnerable. It’s vulnerable to hackers and governments. It’s vulnerable to employee error. And when there’s a toxic data spill, millions of people can be affected. The 2015 Anthem Health data breach affected 80 million people. The 2013 Target Corp. breach affected 110 million.

This toxic data can sit in organizational databases for a long time. Some of the stolen Office of Personnel Management data was decades old. Do you have any idea which companies still have your earliest e-mails, or your earliest posts on that now-defunct social network?

If data is toxic, why do organizations save it?

There are three reasons. The first is that we’re in the middle of the hype cycle of big data. Companies and governments are still punch-drunk on data, and have believed the wildest of promises on how valuable that data is. The research showing that more data isn’t necessarily better, and that there are serious diminishing returns when adding additional data to processes like personalized advertising, is just starting to come out.

The second is that many organizations are still downplaying the risks. Some simply don’t realize just how damaging a data breach would be. Some believe they can completely protect themselves against a data breach, or at least that their legal and public relations teams can minimize the damage if they fail. And while there’s certainly a lot that companies can do technically to better secure the data they hold about all of us, there’s no better security than deleting the data.

The last reason is that some organizations understand both the first two reasons and are saving the data anyway. The culture of venture-capital-funded start-up companies is one of extreme risk taking. These are companies that are always running out of money, that always know their impending death date.

They are so far from profitability that their only hope for surviving is to get even more money, which means they need to demonstrate rapid growth or increasing value. This motivates those companies to take risks that larger, more established, companies would never take. They might take extreme chances with our data, even flout regulations, because they literally have nothing to lose. And often, the most profitable business models are the most risky and dangerous ones.

We can be smarter than this. We need to regulate what corporations can do with our data at every stage: collection, storage, use, resale and disposal. We can make corporate executives personally liable so they know there’s a downside to taking chances. We can make the business models that involve massively surveilling people the less compelling ones, simply by making certain business practices illegal.

The Ashley Madison data breach was such a disaster for the company because it saved its customers’ real names and credit card numbers. It didn’t have to do it this way. It could have processed the credit card information, given the user access, and then deleted all identifying information.

To be sure, it would have been a different company. It would have had less revenue, because it couldn’t charge users a monthly recurring fee. Users who lost their password would have had more trouble re-accessing their account. But it would have been safer for its customers.

Similarly, the Office of Personnel Management didn’t have to store everyone’s information online and accessible. It could have taken older records offline, or at least onto a separate network with more secure access controls. Yes, it wouldn’t be immediately available to government employees doing research, but it would have been much more secure.

Data is a toxic asset. We need to start thinking about it as such, and treat it as we would any other source of toxicity. To do anything else is to risk our security and privacy.

This essay previously appeared on CNN.com.

Posted on March 4, 2016 at 5:32 AMView Comments

More on the "Data as Exhaust" Metaphor

Research paper: Gavin J.D. Smith, “Surveillance, Data and Embodiment: On the Work of Being Watched,” Body and Society, January 2016.

Abstract: Today’s bodies are akin to ‘walking sensor platforms’. Bodies either host, or are the subjects of, an array of sensing devices that act to convert bodily movements, actions and dynamics into circulative data. This article proposes the notions of ‘disembodied exhaust’ and ’embodied exhaustion’ to conceptualise processes of bodily sensorisation and datafication. As the material body interfaces with networked sensor technologies and sensing infrastructures, it emits disembodied exhaust: gaseous flows of personal information that establish a representational data-proxy. It is this networked actant that progressively structures how embodied subjects experience their daily lives. The significance of this symbiont medium in determining the outcome of interplays between networked individuals and audiences necessitates that it is carefully contrived. The article explores the nature and function of the data-proxy, and its impact on social relations. Drawing on examples that depict individuals engaging with their data-proxies, the article suggests that managing a virtual presence is analogous to a work relation, demanding diligence and investment. But it also shows how the data-proxy operates as a mode of affect that challenges conventional distinctions made between organic and inorganic bodies, agency and actancy, mortality and immortality, presence and absence.

Posted on February 29, 2016 at 6:17 AMView Comments

Sidebar photo of Bruce Schneier by Joe MacInnis.