Entries Tagged "databases"

Page 6 of 14

NSA Storing Internet Data, Social Networking Data, on Pretty Much Everybody

Two new stories based on the Snowden documents.

This is getting silly. General Alexander just lied about this to Congress last week. The old NSA tactic of hiding behind a shell game of different code names is failing. It used to be they could get away with saying “Project X doesn’t do that,” knowing full well that Projects Y and Z did and that no one would call them on it. Now they’re just looking shiftier and shiftier.

The program the New York Times exposed is basically Total Information Awareness, which Congress defunded in 2003 because it was just too damned creepy. Now it’s back. (Actually, it never really went away. It just changed code names.)

I’m also curious how all those PRISM-era denials from Internet companies about the NSA not having “direct access” to their servers jibes with this paragraph:

The overall volume of metadata collected by the N.S.A. is reflected in the agency’s secret 2013 budget request to Congress. The budget document, disclosed by Mr. Snowden, shows that the agency is pouring money and manpower into creating a metadata repository capable of taking in 20 billion “record events” daily and making them available to N.S.A. analysts within 60 minutes.

Honestly, I think the details matter less and less. We have to assume that the NSA has everyone who uses electronic communications under constant surveillance. New details about hows and whys will continue to emerge—for example, now we know the NSA’s repository contains travel data—but the big picture will remain the same.

Related: I’ve said that it seems that the NSA now has a PR firm advising it on response. It’s trying to teach General Alexander how to better respond to questioning.

Also related: A cute flowchart on how to avoid NSA surveillance.

Posted on October 1, 2013 at 1:08 PMView Comments

Paradoxes of Big Data

Interesting paper: “Three Paradoxes of Big Data,” by Neil M. Richards and Jonathan H. King, Stanford Law Review Online, 2013.

Abstract: Big data is all the rage. Its proponents tout the use of sophisticated analytics to mine large data sets for insight as the solution to many of our society’s problems. These big data evangelists insist that data-driven decisionmaking can now give us better predictions in areas ranging from college admissions to dating to hiring to medicine to national security and crime prevention. But much of the rhetoric of big data contains no meaningful analysis of its potential perils, only the promise. We don’t deny that big data holds substantial potential for the future, and that large dataset analysis has important uses today. But we would like to sound a cautionary note and pause to consider big data’s potential more critically. In particular, we want to highlight three paradoxes in the current rhetoric about big data to help move us toward a more complete understanding of the big data picture. First, while big data pervasively collects all manner of private information, the operations of big data itself are almost entirely shrouded in legal and commercial secrecy. We call this the Transparency Paradox. Second, though big data evangelists talk in terms of miraculous outcomes, this rhetoric ignores the fact that big data seeks to identify at the expense of individual and collective identity. We call this the Identity Paradox. And third, the rhetoric of big data is characterized by its power to transform society, but big data has power effects of its own, which privilege large government and corporate entities at the expense of ordinary individuals. We call this the Power Paradox. Recognizing the paradoxes of big data, which show its perils alongside its potential, will help us to better understand this revolution. It may also allow us to craft solutions to produce a revolution that will be as good as its evangelists predict.

EDITED TO ADD (10/11): Here’s an HTML version of the paper.

Posted on September 26, 2013 at 6:58 AMView Comments

Google Knows Every Wi-Fi Password in the World

This article points out that as people are logging into Wi-Fi networks from their Android phones, and backing up those passwords along with everything else into Google’s cloud, that Google is amassing an enormous database of the world’s Wi-Fi passwords. And while it’s not every Wi-Fi password in the world, it’s almost certainly a large percentage of them.

Leaving aside Google’s intentions regarding this database, it is certainly something that the US government could force Google to turn over with a National Security Letter.

Something else to think about.

Posted on September 20, 2013 at 7:05 AMView Comments

The Office of the Director of National Intelligence Defends NSA Surveillance Programs

Here’s a transcript of a panel discussion about NSA surveillance. There’s a lot worth reading here, but I want to quote Bob Litt’s opening remarks. He’s the General Counsel for ODNI, and he has a lot to say about the programs revealed so far in the Snowden documents.

I’m reminded a little bit of a quote that, like many quotes, is attributed to Mark Twain but in fact is not Mark Twain’s, which is that a lie can get halfway around the world before the truth gets its boots on. And unfortunately, there’s been a lot of misinformation that’s come out about these programs. And what I would like to do in the next couple of minutes is actually go through and explain what the programs are and what they aren’t.

I particularly want to emphasize that I hope you come away from this with the understanding that neither of the programs that have been leaked to the press recently are indiscriminate sweeping up of information without regard to privacy or constitutional rights or any kind of controls. In fact, from my boss, the director of national intelligence, on down through the entire intelligence community, we are in fact sensitive to privacy and constitutional rights. After all, we are citizens of the United States. These are our rights too.

So as I said, we’re talking about two types of intelligence collection programs. I want to start discussing them by making the point that in order to target the emails or the phone calls or the communications of a United States citizen or a lawful permanent resident of the United States, wherever that person is located, or of any person within the United States, we need to go to court, and we need to get an individual order based on probable cause, the equivalent of an electronic surveillance warrant.

That does not mean and nobody has ever said that that means we never acquire the contents of an email or telephone call to which a United States person is a party. Whenever you’re doing any collection of information, you’re going to—you can’t avoid some incidental acquisition of information about nontargeted persons. Think of a wiretap in a criminal case. You’re wiretapping somebody, and you intercept conversations that are innocent as well as conversations that are inculpatory. If we seize somebody’s computer, there’s going to be information about innocent people on that. This is just a necessary incident.

What we do is we impose controls on the use of that information. But what we cannot do—and I’m repeating this—is go out and target the communications of Americans for collection without an individual court order.

So the first of the programs that I want to talk about that was leaked to the press is what’s been called Section 215, or business record collection. It’s called Section 215 because that was the section of the Patriot Act that put the current version of that statute into place. And under that ­ this statute, we collect telephone metadata, using a court order which is authorized by the Foreign Intelligence Surveillance Act, under a provision which allows a government to obtain business records for intelligence and counterterrorism purposes. Now, by metadata, in this context, I mean data that describes the phone calls, such as the telephone number making the call, the telephone number dialed, the data and time the call was made and the length of the call. These are business records of the telephone companies in question, which is why they can be collected under this provision.

Despite what you may have read about this program, we do not collect the content of any communications under this program. We do not collect the identity of any participant to any communication under this program. And while there seems to have been some confusion about this as recently as today, I want to make perfectly clear we do not collect cellphone location information under this program, either GPS information or cell site tower information. I’m not sure why it’s been so hard to get people to understand that because it’s been said repeatedly.

When the court approves collection under this statute, it issues two orders. One order, which is the one that was leaked, is an order to providers directing them to turn the relevant information over to the government. The other order, which was not leaked, is the order that spells out the limitations on what we can do with the information after it’s been collected, who has access, what purposes they can access it for and how long it can be retained.

Some people have expressed concern, which is quite a valid concern in the abstract, that if you collect large quantities of metadata about telephone calls, you could subject it to sophisticated analysis, and using those kind of analytical tools, you can derive a lot of information about people that would otherwise not be discoverable.

The fact is, we are specifically not allowed to do that kind of analysis of this data, and we don’t do it. The metadata that is acquired and kept under this program can only be queried when there is reasonable suspicion, based on specific, articulable facts, that a particular telephone number is associated with specified foreign terrorist organizations. And the only purpose for which we can make that query is to identify contacts. All that we get under this program, all that we collect, is metadata. So all that we get back from one of these queries is metadata.

Each determination of a reasonable suspicion under this program must be documented and approved, and only a small portion of the data that is collected is ever actually reviewed, because the vast majority of that data is never going to be responsive to one of these terrorism-related queries.

In 2012 fewer than 300 identifiers were approved for searching this data. Nevertheless, we collect all the data because if you want to find a needle in the haystack, you need to have the haystack, especially in the case of a terrorism-related emergency, which is—and remember that this database is only used for terrorism-related purposes.

And if we want to pursue any further investigation as a result of a number that pops up as a result of one of these queries, we have to do, pursuant to other authorities and in particular if we want to conduct electronic surveillance of any number within the United States, as I said before, we have to go to court, we have to get an individual order based on probable cause.

That’s one of the two programs.

The other program is very different. This is a program that’s sometimes referred to as PRISM, which is a misnomer. PRISM is actually the name of a database. The program is collection under Section 702 of the Foreign Intelligence Surveillance Act, which is a public statute that is widely known to everybody. There’s really no secret about this kind of collection.

This permits the government to target a non-U.S. person, somebody who’s not a citizen or a permanent resident alien, located outside of the United States, for foreign intelligence purposes without obtaining a specific warrant for each target, under the programmatic supervision of the FISA Court.

And it’s important here to step back and note that historically and at the time FISA was originally passed in 1978, this particular kind of collection, targeting non-U.S. persons outside of the United States for foreign intelligence purposes, was not intended to be covered by FISA as ­ at all. It was totally outside of the supervision of the FISA Court and totally within the prerogative of the executive branch. So in that respect, Section 702 is properly viewed as an expansion of FISA Court authority, rather than a contraction of that authority.

So Section 702, as I—as I said, it’s—is limited to targeting foreigners outside the United States to acquire foreign intelligence information. And there is a specific provision in this statute that prohibits us from making an end run about this, about—on this requirement, because we are expressly prohibited from targeting somebody outside of the United States in order to obtain some information about somebody inside the United States. That is to say, if we know that somebody outside of the United States is communicating with Spike Bowman, and we really want to get Spike Bowman’s communications, we’ve got to get an electronic surveillance order on Spike Bowman. We cannot target the out ­ the person outside of the United States to collect on Spike.

In order to use Section 702, the government has to obtain approval from the FISA Court for the plan it intends to use to conduct the collection. This plan includes, first of all, identification of the foreign intelligence purposes of the collection; second, the plan and the procedures for ensuring that the individuals targeted for collection are in fact non-U.S. persons who are located outside of the United States. These are referred to as targeting procedures. And in addition, we have to get approval of the government’s procedures for what it will do with information about a U.S. person or someone inside the United States if we get that information through this collection. These procedures, which are called minimization procedures, determine what we can keep and what we can disseminate to other government agencies and impose limitations on that. And in particular, dissemination of information about U.S. persons is expressly prohibited unless that information is necessary to understand foreign intelligence or to assess its importance or is evidence of a crime or indicates a—an imminent threat of death or serious bodily harm.

And again, these procedures, the targeting and minimization procedures, have to be approved by the FISA court as consistent with the statute and consistent with the Fourth Amendment. And that’s what the Section 702 collection is.

The last thing I want to talk about a little bit is the myth that this is sort of unchecked authority, because we have extensive oversight and control over the collection, which involves all three branches of government. First, NSA has extensive technological processes, including segregated databases, limited access and audit trails, and they have extensive internal oversight, including their own compliance officer, who oversees compliance with the rules.

Second, the Department of Justice and my office, the Office of the Director of National Intelligence, are specifically charged with overseeing NSA’s activities to make sure that there are no compliance problems. And we report to the Congress twice a year on the use of these collection authorities and compliance problems. And if we find a problem, we correct it. Inspectors general, independent inspectors general, who, as you all know, also have an independent reporting responsibility to Congress, also are charged with undertaking a review of how these surveillance programs are carried out.

Any time that information is collected in violation of the rules, it’s reported immediately to the FISA court and is also reported to the relevant congressional oversight committees. It doesn’t matter how small the—or technical the violation is. And information that’s collected in violation of the rules has to be purged, with very limited exceptions.

Both the FISA court and the congressional oversight committees, which are Intelligence and Judiciary, take a very active role in overseeing this program and ensuring that we adhere to the requirements of the statutes and the court orders. And let me just stop and say that the suggestion that the FISA court is a rubber stamp is a complete canard, as anybody who’s ever had the privilege of appearing before Judge Bates or Judge Walton can attest.

Now, this is a complex system, and like any complex system, it’s not error free. But as I said before, every time we have found a mistake, we’ve fixed it. And the mistakes are self-reported. We find them ourselves in the exercise of our oversight. No one has ever found that there has ever been—and by no one, I mean the people at NSA, the people at the Department of Justice, the people at the Office of the Director of National Intelligence, the inspectors general, the FISA court and the congressional oversight committees, all of whom have visibility into this—nobody has ever found that there has ever been any intentional effort to violate the law or any intentional misuse of these tools.

As always, the fundamental issue is trust. If you believe Litt, this is all very comforting. If you don’t, it’s more lies and misdirection. Taken at face value, it explains why so many tech executives were able to say they had never heard of PRISM: it’s the internal NSA name for the database, and not the name of the program. I also note that Litt uses the word “collect” to mean what it actually means, and not the way his boss, Director of National Intelligence James Clapper, Jr., used it to deliberately lie to Congress.

Posted on July 4, 2013 at 7:07 AMView Comments

Details of NSA Data Requests from US Corporations

Facebook (here), Apple (here), and Yahoo (here) have all released details of US government requests for data. They each say that they’ve turned over user data for about 10,000 people, although the time frames are different. The exact number isn’t important; what’s important is that it’s much lower than the millions implied by the PRISM document.

Now the big question: do we believe them? If we don’t, what would it take before we did believe them?

Posted on June 18, 2013 at 4:00 PMView Comments

NSA Secrecy and Personal Privacy

In an excellent essay about privacy and secrecy, law professor Daniel Solove makes an important point. There are two types of NSA secrecy being discussed. It’s easy to confuse them, but they’re very different.

Of course, if the government is trying to gather data about a particular suspect, keeping the specifics of surveillance efforts secret will decrease the likelihood of that suspect altering his or her behavior.

But secrecy at the level of an individual suspect is different from keeping the very existence of massive surveillance programs secret. The public must know about the general outlines of surveillance activities in order to evaluate whether the government is achieving the appropriate balance between privacy and security. What kind of information is gathered? How is it used? How securely is it kept? What kind of oversight is there? Are these activities even legal? These questions can’t be answered, and the government can’t be held accountable, if surveillance programs are completely classified.

This distinction is also becoming important as Snowden keeps talking. There are a lot of articles about Edward Snowden cooperating with the Chinese government. I have no idea if this is true—Snowden denies it—or if it’s part of an American smear campaign designed to change the debate from the NSA surveillance programs to the whistleblower’s actions. (It worked against Assange.) In anticipation of the inevitable questions, I want to change a previous assessment statement: I consider Snowden a hero for whistleblowing on the existence and details of the NSA surveillance programs, but not for revealing specific operational secrets to the Chinese government. Charles Pierce wishes Snowden would stop talking. I agree; the more this story is about him the less it is about the NSA. Stop giving interviews and let the documents do the talking.

Back to Daniel Solove, this excellent 2011 essay on the value of privacy is making the rounds again. And it should.

Many commentators had been using the metaphor of George Orwell’s 1984 to describe the problems created by the collection and use of personal data. I contended that the Orwell metaphor, which focuses on the harms of surveillance (such as inhibition and social control) might be apt to describe law enforcement’s monitoring of citizens. But much of the data gathered in computer databases is not particularly sensitive, such as one’s race, birth date, gender, address, or marital status. Many people do not care about concealing the hotels they stay at, the cars they own or rent, or the kind of beverages they drink. People often do not take many steps to keep such information secret. Frequently, though not always, people’s activities would not be inhibited if others knew this information.

I suggested a different metaphor to capture the problems: Franz Kafka’s The Trial, which depicts a bureaucracy with inscrutable purposes that uses people’s information to make important decisions about them, yet denies the people the ability to participate in how their information is used. The problems captured by the Kafka metaphor are of a different sort than the problems caused by surveillance. They often do not result in inhibition or chilling. Instead, they are problems of information processing—the storage, use, or analysis of data—rather than information collection. They affect the power relationships between people and the institutions of the modern state. They not only frustrate the individual by creating a sense of helplessness and powerlessness, but they also affect social structure by altering the kind of relationships people have with the institutions that make important decisions about their lives.

The whole essay is worth reading, as is—I hope—my essay on the value of privacy from 2006.

I have come to believe that the solution to all of this is regulation. And it’s not going to be the regulation of data collection; it’s going to be the regulation of data use.

EDITED TO ADD (6/18): A good rebutttal to the “nothing to hide” argument.

Posted on June 18, 2013 at 11:02 AMView Comments

Data at Rest vs. Data in Motion

For a while now, I’ve pointed out that cryptography is singularly ill-suited to solve the major network security problems of today: denial-of-service attacks, website defacement, theft of credit card numbers, identity theft, viruses and worms, DNS attacks, network penetration, and so on.

Cryptography was invented to protect communications: data in motion. This is how cryptography was used throughout most of history, and this is how the militaries of the world developed the science. Alice was the sender, Bob the receiver, and Eve the eavesdropper. Even when cryptography was used to protect stored data—data at rest—it was viewed as a form of communication. In “Applied Cryptography,” I described encrypting stored data in this way: “a stored message is a way for someone to communicate with himself through time.” Data storage was just a subset of data communication.

In modern networks, the difference is much more profound. Communications are immediate and instantaneous. Encryption keys can be ephemeral, and systems like the STU-III telephone can be designed such that encryption keys are created at the beginning of a call and destroyed as soon as the call is completed. Data storage, on the other hand, occurs over time. Any encryption keys must exist as long as the encrypted data exists. And storing those keys becomes as important as storing the unencrypted data was. In a way, encryption doesn’t reduce the number of secrets that must be stored securely; it just makes them much smaller.

Historically, the reason key management worked for stored data was that the key could be stored in a secure location: the human brain. People would remember keys and, barring physical and emotional attacks on the people themselves, would not divulge them. In a sense, the keys were stored in a “computer” that was not attached to any network. And there they were safe.

This whole model falls apart on the Internet. Much of the data stored on the Internet is only peripherally intended for use by people; it’s primarily intended for use by other computers. And therein lies the problem. Keys can no longer be stored in people’s brains. They need to be stored on the same computer, or at least the network, that the data resides on. And that is much riskier.

Let’s take a concrete example: credit card databases associated with websites. Those databases are not encrypted because it doesn’t make any sense. The whole point of storing credit card numbers on a website is so it’s accessible—so each time I buy something, I don’t have to type it in again. The website needs to dynamically query the database and retrieve the numbers, millions of times a day. If the database were encrypted, the website would need the key. But if the key were on the same network as the data, what would be the point of encrypting it? Access to the website equals access to the database in either case. Security is achieved by good access control on the website and database, not by encrypting the data.

The same reasoning holds true elsewhere on the Internet as well. Much of the Internet’s infrastructure happens automatically, without human intervention. This means that any encryption keys need to reside in software on the network, making them vulnerable to attack. In many cases, the databases are queried so often that they are simply left in plaintext, because doing otherwise would cause significant performance degradation. Real security in these contexts comes from traditional computer security techniques, not from cryptography.

Cryptography has inherent mathematical properties that greatly favor the defender. Adding a single bit to the length of a key adds only a slight amount of work for the defender, but doubles the amount of work the attacker has to do. Doubling the key length doubles the amount of work the defender has to do (if that—I’m being approximate here), but increases the attacker’s workload exponentially. For many years, we have exploited that mathematical imbalance.

Computer security is much more balanced. There’ll be a new attack, and a new defense, and a new attack, and a new defense. It’s an arms race between attacker and defender. And it’s a very fast arms race. New vulnerabilities are discovered all the time. The balance can tip from defender to attacker overnight, and back again the night after. Computer security defenses are inherently very fragile.

Unfortunately, this is the model we’re stuck with. No matter how good the cryptography is, there is some other way to break into the system. Recall how the FBI read the PGP-encrypted email of a suspected Mafia boss several years ago. They didn’t try to break PGP; they simply installed a keyboard sniffer on the target’s computer. Notice that SSL- and TLS-encrypted web communications are increasingly irrelevant in protecting credit card numbers; criminals prefer to steal them by the hundreds of thousands from back-end databases.

On the Internet, communications security is much less important than the security of the endpoints. And increasingly, we can’t rely on cryptography to solve our security problems.

This essay originally appeared on DarkReading. I wrote it in 2006, but lost it on my computer for four years. I hate it when that happens.

EDITED TO ADD (7/14): As several readers pointed out, I overstated my case when I said that encrypting credit card databases, or any database in constant use, is useless. In fact, there is value in encrypting those databases, especially if the encryption appliance is separate from the database server. In this case, the attacker has to steal both the encryption key and the database. That’s a harder hacking problem, and this is why credit-card database encryption is mandated within the PCI security standard. Given how good encryption performance is these days, it’s a smart idea. But while encryption makes it harder to steal the data, it is only harder in a computer-security sense and not in a cryptography sense.

Posted on June 30, 2010 at 12:53 PMView Comments

Electronic Health Record Security Analysis

In British Columbia:

When Auditor-General John Doyle and his staff investigated the security of electronic record-keeping at the Vancouver Coastal Health Authority, they found trouble everywhere they looked.

“In every key area we examined, we found serious weaknesses,” wrote Doyle. “Security controls throughout the network and over the database were so inadequate that there was a high risk of external and internal attackers being able to access or extract information without the authority even being aware of it.”

[…]

“No intrusion prevention and detection systems exist to prevent or detect certain types of [online] attacks. Open network connections in common business areas. Dial-in remote access servers that bypass security. Open accounts existing, allowing health care data to be copied even outside the Vancouver Coastal Health Care authority at any time.”

More than 4,000 users were found to have access to the records in the database, many of them at a far higher level than necessary.

[…]

“Former client records and irrelevant records for current clients are still accessible to system users. Hundreds of former users, both employees and contractors, still have access to resources through active accounts, network accounts, and virtual private network accounts.”

While this report is from Canada, the same issues apply to any electronic patient record system in the U.S. What I find really interesting is that the Canadian government actually conducted a security analysis of the system, rather than just maintaining that everything would be fine. I wish the U.S. would do something similar.

The report, “The PARIS System for Community Care Services: Access and Security,” is here.

Posted on March 23, 2010 at 12:23 PMView Comments

1 4 5 6 7 8 14

Sidebar photo of Bruce Schneier by Joe MacInnis.