Evidence that the NSA Is Storing Voice Content, Not Just Metadata

Interesting speculation that the NSA is storing everyone’s phone calls, and not just metadata. Definitely worth reading.

I expressed skepticism about this just a month ago. My assumption had always been that everyone’s compressed voice calls is just too much data to move around and store. Now, I don’t know.

There’s a bit of a conspiracy-theory air to all of this speculation, but underestimating what the NSA will do is a mistake. General Alexander has told members of Congress that they can record the contents of phone calls. And they have the technical capability.

Earlier reports have indicated that the NSA has the ability to record nearly all domestic and international phone calls—in case an analyst needed to access the recordings in the future. A Wired magazine article last year disclosed that the NSA has established “listening posts” that allow the agency to collect and sift through billions of phone calls through a massive new data center in Utah, “whether they originate within the country or overseas.” That includes not just metadata, but also the contents of the communications.

William Binney, a former NSA technical director who helped to modernize the agency’s worldwide eavesdropping network, told the Daily Caller this week that the NSA records the phone calls of 500,000 to 1 million people who are on its so-called target list, and perhaps even more. “They look through these phone numbers and they target those and that’s what they record,” Binney said.

Brewster Kahle, a computer engineer who founded the Internet Archive, has vast experience storing large amounts of data. He created a spreadsheet this week estimating that the cost to store all domestic phone calls a year in cloud storage for data-mining purposes would be about $27 million per year, not counting the cost of extra security for a top-secret program and security clearances for the people involved.

I believe that, to the extent that the NSA is analyzing and storing conversations, they’re doing speech-to-text as close to the source as possible and working with that. Even if you have to store the audio for conversations in foreign languages, or for snippets of conversations the conversion software is unsure of, it’s a lot fewer bits to move around and deal with.

And, by the way, I hate the term “metadata.” What’s wrong with “traffic analysis,” which is what we’ve always called that sort of thing?

Posted on June 18, 2013 at 5:57 AM82 Comments


b June 18, 2013 7:10 AM

It is “metadata” because it used to access the content data.

Doing trafficanalysis through metadata is only one aspect of it and that one is probably not the most important.

OneAndAnother June 18, 2013 7:19 AM

I think the term metadata is probably more meaningful to a general audience these days when discussing the scope of the data captured. “Traffic analysis” to the naïve ear implies just the on-network portion of the information, who is calling whom and for how long, as opposed to also capturing people’s physical location amongst other things.

J. Oquendo June 18, 2013 7:20 AM

I think there is a disconnect/misunderstanding about “storage.” When people think of “storing a call,” there is likely an impression that an audio is recorded as a wav file (pretty big) and stored. There are plenty of compression techniques to lop off the storage requirements (mp3, mp4, etc.) followed by further compression (zip, 7zip, gzip, etc) to minimize the storage required.

Had I to conceive a program to do this, It would look something like this:

Caller –> TAP –> Yonder –> Callee

Where the TAP would do the following:

  • 1) Hash the information associated with the call (caller, callee, date, time, etc)
  • 2) Record the call into a small format (mp4)
  • 3) Convert text to speech into a flat file
  • 4) Correlate data from 1 and 3 and store into a DB
  • 5) Compress #2 and store
  • 6) Pass #4 to analysts

Text files are small, I imagine a decent 1 hour conversation converted to speech would be about 100 pages of text. Give or take (high end): 3Mb text file versus a 60Mb wav file.

CaptSutter June 18, 2013 7:55 AM

Traffic Analysis is what you do, Meta-Data is what you have or the data surrounding a “Phone-Call” or data transfer (or picture). It is a generic term to cover a multitude of sins. Meta data is much more powerful than traffic analysis, because you might be able to do more with it later when you have the technology or you know what to look for.

Someone June 18, 2013 7:56 AM

The more I hear about the NSA, the more depressed I become. I certainly don’t doubt that they have the capability to store all domestic phone records.

Given the “tip of the iceberg” comments senators have made recently in regards to the phone traffic analysis and PRISM programs in addition to others from the NSA not-so-subtly saying they do mass traffic analysis on domestic Internet traffic….and that the outrage regarding these programs seems to be dieing down domestically….I can tell that this is going to have a substantial negative impact on my life.

pingu June 18, 2013 7:58 AM

I believe voice codecs don’t work in the same way general purpose codecs (like mp3) do. The voice codecs take advantage of the highly predictible characteristics of speech sounds. This leads to incredibly low bit rates.
Over GSM the data rate for voice is 9600 kbps (1.2kB/s). This means around 4MB per hour of speech. And different codecs can do even better.

paul June 18, 2013 8:07 AM

With those kinds of cost numbers for storage, the NSA would be crazy not to archive the original bits (even if they’re not immediately online). Analytical tools get better every year, and there are so many things in addition to text that you’d want to look for: automated speaker recognition, cadence and voice-stress analysis, background-noise identification…

Stephen June 18, 2013 8:11 AM

Read James Bamford’s book the Shadow factory. I think you can read some of it on goggle books preview for free. I read it. Bought a used hard copy from amazon.

scary stuff….

They been doing all this for years and as the tech gets better they get better. and of course it was 9/11 than changed the NSA mission.

petrilli June 18, 2013 8:44 AM

So, the calculations in the cited spreadsheet are based on a 8KB/sec (64kbps) stream, which would be the standard G.711 codec that is used as the basis for almost all voice transmissions. This is the “gold standard”, and would have no degradation what-so-ever.

But, if you’re willing to take a little hit, you could use G.726 at a 32kbps (4KB/sec) rate, which to the human ear isn’t going to sound noticeably worse. It’s also what’s used in the DECT cordless handsets. That would cut your storage costs in half. Studies show this is approximately 97% of the quality of a G.711 codec.

Let’s take it one step further, and move to G.729, which is common in VoIP, and is actually what most conference call systems use. This operates at 8kbps (1KB/sec), and now we have an 8x reduction in space. This would reduce the estimated data storage requirements (as documented in the spreadsheet) from 272PB/year to 34PB and the cost to $3.4M.

So, tell me dear friends, what do YOU think?

c June 18, 2013 8:48 AM

I think this makes sense: voice audio -> phonemes@n-bits each ->analyze/compress/store, and I think you’ll find they have a lot of experience with phoneme-based systems

J.D. Bertron June 18, 2013 9:12 AM

And for $27,000,000 I’m sure they do it. It’s tiny compared to the cost of other programs, so from a cost/benefit analysis, it’s pretty much guaranteed they’re doing it.

J. Peterson June 18, 2013 9:12 AM

Brewster Kahle (of the Internet Archive) did a detailed back-of-the-envelope calculation, and found the cost of recording all phone calls wasn’t that high (in NSA terms). $30M data center build, $2M/yr data center power costs.


The real fun is going to be what happens when courts start issuing subpoenas for the meta data and recordings. It’s already starting to happen:


Steve June 18, 2013 9:13 AM

Of course once the DHS get involved, the data collected and used will include American’s. Will attitudes in the US shift? After all the DHS has a history of scope creep.

“…Under current Homeland Security Secretary Janet Napolitano, the DHS targeted law-abiding Americans for their political beliefs, most strikingly in a 2009 report on “extremists” that warned of the dangers posed by pro-life advocates, critics of same-sex marriage and groups concerned with abiding by the U.S. Constitution…”


Beth Meacham June 18, 2013 9:18 AM

You are overestimating the amount of storage needed for a text file. I just looked at a 250 page manuscript, over 90,000 words, and the file is less than 500K. Uncompressed.

sg June 18, 2013 9:33 AM

about data compression for voice: there are “parametric voice codecs” that use about 2 kilobits per second. It’s much easier than speech-to-text conversion (I suppose) but still very compact.

jackson June 18, 2013 9:42 AM

The most interesting thing about the first blog cited is a comment by “Paul Noel”. It doesn’t even matter if he’s correct or not. Did anyone read that comment, because I am finding a ton of others like it across the web.

shoeless June 18, 2013 9:47 AM

Please sort out your SSL cert</i [Expires On 6/18/13]

“The cobbler’s children always go barefoot.”

Wooverine June 18, 2013 9:48 AM

While the cloud cost might be $27M the program costs could easily be 10x if not 100x that.

Roxanne June 18, 2013 9:56 AM

With regards to photography, the data is the photo (that is, what’s represented in the photo) while the metadata is not the photograph, but everything else: When it was taken, what the camera settings were, all that stuff. Putting it through processing adds more metadata: How the colors were changed, how it was cropped, everything else that’s done. It’s still not the data in the photograph. From the metadata, you can’t tell if the photo is of my son or the squirrel raiding the birdfeeder. It’s not traffic analysis: It’s data about the data-collection process. We need a word for it. At the moment, the word is metadata.

Think of a new word if you don’t like this one, but don’t confuse it with further processing either the data or the metadata.

Ollie Jones June 18, 2013 10:01 AM

It’s pretty obvious the puzzle-palace guys are capturing and indexing all the call detail records (aka “metadata”, aka “traffic analysis” data) they can get their hands on. It’s pretty obvious they don’t consider the capturing and indexing of this information to be controlled by FISA or any other scheme for protecting against “search and seizure.” It’s pretty obvious that they make decisions about whether it’s lawful to search the data at the time they USE their indexes, not the time they CREATE them.

What’s not obvious?

Do the call detail records they have indexed contain location information (lat/long, nearest tower, service address, etc.)? If a participant was on the move during the call, do the CDRs show the tower-handoffs or changes in location.

Do the indexed call detail records contain audio files and/or searchable transcripts of the content of the communications? Or do they refrain from gathering that stuff until they have some kind FISA / National Security Letter approval?

How long do they retain the indexed information?

Milo M. June 18, 2013 10:01 AM


” . . . a huge number of phone calls start out as Internet packets and end as Internet packets, but have to be switched to, and then from, a voice circuit in between.

What remains is to put the Internet protocol in the middle of the network as well. And it’s happening. In a 2009 filing with the U.S. Federal Communications Commission, AT&T . . . called on the government to give a date certain for the last plain old telephone call.

In June, a Washington, D.C., advisory group, the Voice Communication Exchange Committee, formed and committed itself to a complete transition to the Internet protocol by a date of its own choosing: June 15, 2018.”

From 9 years ago:


“This feature of VoIP software exposes its Achilles’ heel. Stored records have, by longstanding decisions of the U.S. Supreme Court, no “reasonable expectation of privacy.” As a result, searching of those records with much the same purpose as a wiretap can be conducted without VoIP calls having any Fourth Amendment protection.”

The paper, “Katz Is Dead. Long Live Katz”, linked in the article:


Navy Squid June 18, 2013 10:34 AM

Interesting semantic take on the situation. Completely contrary to how the law is written, and how any organization with limited funds would do this (and DoD has limited funds, believe it or not) but, hey! tinfoil hats are cheap, and, apparently, fashionable.

anon June 18, 2013 10:34 AM

I also thought that storing phone calls was simply out of reach. But I hadn’t stopped to run the numbers. This article shows a different picture:


Based on this, I think that storing all phone call audio as well as emails has probably been very feasible for quite some time. And that’s even without considering the use of speech-to-text. If I was the NSA, I think I’d want to keep the actual audio around for use in court–a text transcript strikes me as too unconvincing easy for a good lawyer to defend against. But the text contents of calls would be vastly smaller and much more easily searchable, so they would probably want to do that too.

vas pup June 18, 2013 10:36 AM

Intel collection within domestic LEO was always there, and judges never ever should know all techniques including CI, undecover ops, because some Intel is not intended for prosecution in the court (see link provided @Dilbert above), but rather for multiple biforcations on further tragets analysis of criminal and now terrorist activity inside the country. That information is not to be subject of subpoenas (court or lawyers) because it is internal LEO Intel. As soon as internal Intel get to the point when Intel on target is clearly could be passed to/for prosecution or immenent threat exists of terrorist attact and prevention is the highest priority, then Intel is ‘declassified’ and moved to the public domain for arrest/prosecution/etc.
My concerns are:
(1)Targets for LEO intel collected is not biased as with recent IRS case or based on First Amendment usage (dessent is not the same as disloyalty!)or just other irrelevant to LEO purposes altogether;
(2)LEO intel is not passed to the private sector for background/loyalty checks affecting hiring/firing decisions only for Goverment contractors clearance, because you can’t FOIA private company on decisions made.
(3)Is LEO intel subject of FOIA requests regarding particular techniques of collection (in general) or Intel on particluar operation/collection/storage? needs clarification.
(4)Access to actual data/target (content) is logged and log is analyzed on day-by-day basis for relevance/authorization within LEO.

Bruce H June 18, 2013 10:38 AM

I’ve always thought of metadata as additional data contained within a file (whether it be a text file, an image, an audio file, or what-have-you) that describes the primary data. Think EXIF data for images.

It is certainly not traffic analysis, though metadata can be useful for that.

Ollie Jones June 18, 2013 10:53 AM

Quiz question: If I phone my doctor to ask about the result of my recent HIV test, does HIPAA (US medical records privacy law) govern the confidentiality of that call, or do intelligence regulations govern it?

Clive Robinson June 18, 2013 10:55 AM

As far as “data about data” or meta-data is concerned I’m with Bruce, it’s the wrong name for what it is, and I suspect it’s due to slopy usage getting further cast adrift by journalists, to the point it’s lost it’s former formal meaning (hey who remembers the difference between cracker and hacker…).

Strictly speaking data is a grabbag of bits that can have any meaning you chose to give them. Meta-data is the information that starts to give the bag of bits meaning and thus would give you both structure and data types within a container or record and meta-meta-data gives meaning in the human sense to the data types in the containers.

However in database usage they tend to mis out a step and call meta-meta-data just meta-data, and use the column discription as the meta-data (which it’s not).

Look at it this way

1, Data is an unspecified collection of bits
2, Meta-data turns bits into bytes,ints,chars etc giving them data type meaning.
3, Meta-data also indicats number of byts etc in a column of data for fixed records
4, Meta-meta-data gives a column meaning such as SS7 connection number.
5, Meta-meta-dat also gives other meaning such as “originator” and “terminating” number applicable to the SS7 connection number.

And so on and so forth.

Paul June 18, 2013 10:56 AM

The NSA most definitely IS RECORDING and TRANSCRIBING 100% of All phone calls and this isn’t limited to the USA. This was what they asked for in the Requests for Proposal for the contracts to build the software they now use. Everyone needs to understand that this sort of data only has one use. It is to reduce a people to absolute despotism under a government that is unaccountable and always is “protecting them from terrorists”. Liberals need to know they are the first people killed when this finally happens. It has always been so in every place this crap started. NAZI-Gestapo and SS (The killed the brown shirts) KGB – Do I even need to say… The list is endless.

Craig June 18, 2013 11:00 AM

Great, and then they’ll probably decide that the computer speech-to-text transcription is admissible in court. So, one day, worried that something you’re doing is putting too much stress on your wrist, you say, “I’ll tear a wrist muscle,” the computer transcribes it as “I’m terrorist muscle”, and the next thing you know, you’re off to Gitmo…

Alex June 18, 2013 11:00 AM

Remember we already know the brianiacs have invented their own compression algorithms for fingerprints, so lets assume other brianiacs at the NSA have new, unshared compression schemes for voice.

Rob June 18, 2013 11:05 AM

At first I thought it would just be splitting hairs to worry about whether ‘metadata’ is the right word, but on reflection I think it’s important to be very, very clear. In the world of databases, I believe ‘metadata’ is the specification of the data which is stored in the database. So, just for the sake of argument, the metadata for one record in a DB might be something like:

an integer;
a timestamp;
a long integer;
a long integer;
text string with 40 characters;
a timestamp;
a timestamp;
a floating point number;
a floating point number;
a BLOB (binary large object).

That record itself might contain data such as:

a number which identifies the record in the data-table;
the date and time the record was created;
the originating phone number;
the destination phone number;
the user’s name;
date and time when the call started;
date and time when the call finished;
the location (latitude and longitude) of the caller;
and finally … an arbitrarily long binary string which is the actual voice recording.

Now the first 9 items of data in that record is what the NSA has already owned up to ‘collecting’ but is calling the ‘metadata’. It’s ‘only’ the last item that is, apparently, not being stored.

In reality, who would care about anyone saving the true Metadata above? It’s generic for every call. They are collecting and storing data. If you stop for a moment it’s obvious that not using precise, accurate terms will open some juicy little loopholes for spokespeople in the future. There is a need for some very clear and specific word to deal with the first 9 items of data stored in the example above.

Anne Oni June 18, 2013 11:13 AM

The metadata is merely the index for the archive. How else would you know where the “Bob Smith talks to Mohammed Al Jihad” conversation is stored?

The story of phone tapping has been growing over the years. In 2004 the media had a story about meta-data analysis of Iraqis calling Iraqis on their newly established cell network. Then, the phone tapping scandal in 2006, then the AT&T/NSA building in San Francisco, then the Utah data center, then the Tim Clemente comment.

I’m quite sure the NSA is recording all conversations. When they listen to them is anyone’s guess.

hinhinhin June 18, 2013 11:26 AM

Speech to text is imperfect — just try YouTube’s — and the technology is evolving; even if you had an interest in S2T you would store the audio (voice codecs were mentioned above) so that you could improve the technology (most of the data is fit for unsupervised learning) and take advantage of the improvements (retranscribing with the latest algorithms before giving the text to a human).

Dilbert June 18, 2013 11:58 AM

@hinhinhin (and everyone else that feels this way)

Do you REALLY think the NSA is using the same algorithms that Youtube is using????? Get a clue here folks. I have a friend that went to work as a cryptographer for the NSA many years ago. He spent his first year going through NSA’s math training! They have their own algorithms for this that are proprietary and “top secret”.

You can’t possibly look at Youtube, Dragon Naturally Speaking, Siri, etc and imagine THAT is what the NSA is using!

Fiona June 18, 2013 12:44 PM

@Dilbert – it’s not just a question of algorithms, per se. Obviously I have no idea what NSA is doing, but speech processing is one of the hardest problems in computational linguistics.

The only way I can see them doing better than other researchers is having the computing power to be able to build substantially larger and more complex statistical models than anyone else.

I’m just not convinced that this is a feasible approach, even for them.

Peter T June 18, 2013 12:49 PM

As it was pointed out earlier, the spreadsheet calculation significantly overestimate the sound quality and thus the overall cost of the project – but otherwise fairly accurate.
However, I don’t believe that NSA would fully record every single conversation, they surely have very sophisticated filters to purge the worthless records and keep only the interesting ones.
By completely tracking about 1 million “suspects” at all the time and occasionally/partially tracking an additional 5-10 million, they would achieve a satisfying level of operational coverage – at a negligible cost, barely a few million dollars per year.

On the other hand, terrorists and criminals must have been really stupid to use regular phones in these days and honest people don’t have anything to worry about … or do they? 🙂

Steve NM June 18, 2013 2:50 PM

Much better calculation of actual costs to store here (app. $27m pa in infrastructure and $1.8m in power).

Andy June 18, 2013 2:56 PM

I very much doubt they compress the audio because cost is no object and the higher the quality of the audio, the better the results for audio-based analysis. No doubt they do S2T as well but there’s information you can get from the audio that you can’t get from the text.

Indigo Jones June 18, 2013 3:28 PM

Dilbert, Fiona

The only way I can see them doing better than other researchers is having the computing power to be able to build substantially larger and more complex statistical models than anyone else

They may have proprietary algorithms, but there are practical and theoretical limits to how compressible speech is. The fundamentals of speech compression were worked out during WWII with the invention of the vocoder, which is basically a mathematical model of the human vocal tract. Instead to transmitting speech, a vocoder transmits control signals that specify how the vocal tract changes. Algorithms like MP3 do something similar, basically using a statistical model of human perception to selectively remove frequencies the ear is less sensitive to. These psycho-acoustic mechanisms can be used in conjunction with “lossless” compression like LZW, RLE, CABAC and the like.

Data rates of 1-2 kB/s for intelligible speech are quite realistic. It is mathematically unlikely to see any proprietary encoding mechanism surpass that by any jaw-dropping amount, though, it is worth noting that, talking about the volume of data the NSA is probably gathering, some marginal increase might yield dividends.

As far as using statistical models to guess at the content of digitized speech, this is quite realistic. When Wolfram Alpha came out, I gave it a test: I typed the following string into both Google and Wolfram Alpha: “1 1 2 3 5 8 13.” Both algorithms identified the string just fine, though using vastly different mechanisms.

But this does highlight the real value of metadata — and why it’s a smokescreen whenever some government official says, “Oh, it’s just meta-data, it’s not really that interesting.” This is a lie because any discussion of this situation has to start with the proposition that this meta-data IS interesting, since the NSA has been collecting it for some time. It is interesting to the NSA, clearly.

But its real value can be glimpsed when considering that the content of a phone call may be highly equivocal: two callers may speak in personal idioms, in slang, in multiple languages, obliquely about incidents that occurred off the phone, etc. Call content is highly equivocal.

Meta-data, by contrast, is always unequivocal: such and such a person made a call to somebody else, from this cell tower, for five and a half minutes. Meta-data is more valuable because it carries more certainty than content.

Nick P June 18, 2013 4:22 PM

@ Andy re compression, cost & quality

“I very much doubt they compress the audio because cost is no object and the higher the quality of the audio, the better the results for audio-based analysis. ”

This isn’t necessarily a problem. If they’re worried and want much compression, they can establish a specific compression target based on test data that has an acceptable quality loss. Remember that most people can’t perceive any loss past a certain amount. That would seem safe. If they want no loss & less compression, there’s lossless audio compression such as FLAC they can use.

The tradeoff here is storage space. Some assume an unlimited budget. That’s not realistic: a given NSA project does have limits and metrics. So, if they cut 30-50% (lossless) or more (lossy) of their audio storage requirements, they can store that much more. So, their comparison might be “do we want to store 1 year of uncompressed audio, almost 2 years of lossless compressed audio, or 3-4 years of lossy? Also, can we rotate data between them based on quality needs and post-review information?” There’s many tradeoffs that can be made among these to optimally benefit various stakeholders in their projects, esp. if they have automated data mgmt tech in place.

@ Dilbert re NSA capabilities

“You can’t possibly look at Youtube, Dragon Naturally Speaking, Siri, etc and imagine THAT is what the NSA is using!”

You’re right. The NSA alternates between being at the state of the art and behind behind existing commercial practice. So, they might have a model that works a bit better than Youtube’s or Nuance’s, which they likely got from the same academic researchers. Even more likely they are using commercial grade technology they licensed (or stole) from commercial organizations. And there is the possibility they’re using dated tech due to slow refresh cycle or management gone horribly wrong.

We can’t really know what they have. But we can’t treat them like they have God-like abilities, either. They don’t. These are hard problems. Many bright, non-NSA people are working hard to solve them all over the world. Many have financial incentives too. Unless it’s a problem that’s easy to solve with more money, NSA probably is progressing at a similar rate to others in those fields.

At least the CIA gives us hints at their capabilities.

Curt June 18, 2013 4:26 PM

The way you defeat traffic analysis is by sending a regular stream of traffic, most of which is meaningless, within which real communications are nested. If 100 million Americans installed an App that randomly generated random Google searches and took random URLs or randomly sent emails having random phrases (yeah there’s a number of tech challenges to overcome), government, industry and marketing “snoops” would be overwhelmed. Laws won’t stop these guys. People taking things into their own hands with this type of technical solution will put a stop to it. Just watch, some kind of law would be written to prevent this sort of App.

David June 18, 2013 5:09 PM

Is Meta-Data possibly an over-inclusive term being used to avoid perjury if it is revealed that speech-to-text conversion occurred?

Dirk Praet June 18, 2013 6:30 PM

@ Curt

Just watch, some kind of law would be written to prevent this sort of App.

The topic of generating noise in telecommunication activities has been discussed before on this blog. I believe in the US there is already a law that can be used to criminalise such efforts, i.e. the Computer Fraude and Abuse Act. The CFAA outlaws anyone from sending information, with the intent to cause damage, to a protected computer. The law’s definition of damage includes “impairment to integrity” of a system or data. This is so ambiguous that I have little doubt any laywer could use it to prosecute someone who knowingly and willingly uses a computer program designed to corrupt with false information the databases of the good companies collecting search and other information.

Ryan Ries June 18, 2013 8:50 PM

The only thing more terrifying than the idea that the NSA is storing all phone call voice content… is the idea that they’re using speech conversion software, and then basing their judgments on who said what upon that.

I just hope their secret government speech conversion software is WAY better than anything we’ve ever seen in the private sector.

Gordon June 18, 2013 9:24 PM

Oh come on… they aren’t actually “storing” the data until someone, uh, actually “collects” the book from the shelf!

Coyne Tibbets June 18, 2013 9:32 PM

I refer you to an earlier comment of mine, estimating the data content for a recording of all US phone calls; in the range of 30-90 PB/month (compressed); the same order of magnitude as Brewster Kahle’s 23 PB/month estimate.

I think they’re also storing a lot more than voice. It’s secret of course, but some estimates of capacity for the new Utah data center have been reported: On Fox, “5 zettabytes”; and more recently on TechCrunch, “yottabytes”; that is, in the range of 1 billion to 1 trillion PB.

What do you think they plan to fill that up with?

Even considering just the lower 5 ZB figure: 10247 is 166TB, each, for every living human on the planet from the babes to the senile. (3.7 PB per U.S. citizen.)

I repeat: What would be anyone’s best guess as to what they might use that storage to contain? All phone calls, world wide? Using my estimate structure, that’s 2000 PB/month, tops; the low-end 5 ZB would hold all that for 200 years of phone calls!

anon June 18, 2013 9:43 PM

@ Coyne Tibbets

What would be anyone’s best guess as to what they might use that storage to contain?

If A calls B, does that generate 1 recording or 2? Maybe the figure involves some redundancy.

If A calls B, does that call generate additional records? How many?

If A calls B, does that data undergo any post-processing other than compression? How is that data stored?

What types of caching mechanisms are involved in ensuring that all this data can be easily accessed? What is the overhead for a database system of this magnitude? How often is the system rebooted? How much space do swap files take up?

To me, the more interesting question involves the assumption that NSA is using conventional storage technology in their data center. If they’re using something like holographic data storage, their information densities are much higher, and their data access is massively parallel…

Figureitout June 18, 2013 10:00 PM

It really hit me when I was watching a show on AI, to demonstrate what a hard problem this is: when someone puts a coat on a chair, a human obviously knows that it’s still a chair underneath; the computer is dumbfounded. So you either have to brute force it (which…no impossible) or come up w/ the magic algorithm.

So imagine all the phrases (just in English, let alone all the other languages) that can be misheard…workbench or workbitch? So the evidence being presented to the so-called “courts” is very questionable.

Coyne Tibbets June 18, 2013 10:11 PM

“…1 recording or 2?” My assumption is, 1 recording. Linking (indexing) allows the same call to be recorded to all the individuals involved in the call, at very low overhead (think in terms of a hyperlink from many web pages to just one target page, versus the cost of storing the target page multiple times). Indexing should be a tiny fraction of the overall voice content (sound is expensive to store).

“…additional records?” Transcript with caller identified (feasible technically now I think), phone numbers involved, start and duration, and physical location of each caller from GPS in the cellphone and/or the cell tower. Can’t think of anything else right now. Note that most of those also would require a tiny fraction of the storage of the voice recording itself.

“…caching?” “…swap files?” Don’t know, but caching and swap space is always a fraction of the size of the total data cached. The purpose of caching is to allow a focused fraction of all data to be retrieved quickly (either what is being used “right now” or what is being “used most often”). Swap space is used only for temporarily inactive memory; but memory (active and swapped out) has always been a small fraction of offline storage (since the early 70’s).

“…overhead?” Unknown, precisely. But the principle overhead of databases is indexing, which is always small (relative to the data) except in border cases or when one is idiotic enough to index by the entire value. Consider content of the call: If a transcript can be done as I propose above, speakers talking at 300 WPM translates to roughly 1800 bytes per minute. For a 5 minute call and assuming voice at 9600 bits per second the voice takes around 360 KB; the transcript about 10 KB. Now index the “important” words from the transcript…how much does that take? 1 KB, maybe? Small, relative to the voice content.

The 9600 bits/second is from an article I read circa 1978, which indicated that was sufficient for “quality voice” (speaker recognition and inflection). Even then, they indicated that voice transmission had been done at 150 bits/second for voice converted to phonemes for transmission. Compression was in its infancy at that time; with the compression technologies available today I’m sure we can beat 9,600 bits/second, probably by an order of magnitude.

Coyne Tibbets June 18, 2013 10:25 PM

@Coyne: Mea culpa: In the message above, where I say, “…1 billion to 1 trillion PB.” it should say “…1 million to 1 billion PB.”

And of course, that cascaded to throw other things off. The storage works out to 166 GB/person on Earth; 3.6 TB per US citizen.

The 200 years of phone calls is right, though.

Larry June 19, 2013 12:53 AM

According to this article the datacenter holds 5 zettabyte of data. That’s enough space for a billion years of high fidelity audio. That’s enough for the NSA to store 146 different DVD’s for each of the 7 billion people on the planet.

You would have to be awfully naive to think they aren’t storing it permanently, let alone capturing it.

antibozo June 19, 2013 1:22 AM

Color me skeptical.

You’ll note Bruce’s skepticism is based not only on the storage problem, but also the problem of moving the data around. How would the NSA station monitoring in all of the necessary locations to intercept all of this voice traffic? They don’t even have full monitoring of government IP traffic, not even remotely. Then, assuming they’re able to actually monitor and capture it, do they compress it before sending it to their storage location? If so, that’s a lot of compute, power, and cooling happening to manage all of that compression. If they transmit it uncompressed, that’s a lot of bandwidth. Doing speech-to-text is obviously a much larger compute problem.

People can theorize about sufficient compression to get the storage to a reasonable size, but that’s only one facet of the problem.

kevin June 19, 2013 3:40 AM

There is one thing I am still missing from these discussions: why would storage constitute a “search” as in the 4th amendment? I would say the search takes place when the data is queried, not when it is stored.

Of course, this is all a side-track, but I find it an interesting one. This does not take away from the fact that the issue is not if this practice is legal, but if it is moral. To quite John Oliver on the Daily Show:

We’re not saying anyone broke any laws, we’re just saying it’s a little bit weird that you didn’t have to.

Andreas June 19, 2013 4:02 AM

I am sure they would hate having stored just the speech-to-text content and not the sound.

Just came in (see: http://actu.epfl.ch/news/mapping-a-room-in-a-snap/)
18.06.13 – An algorithm developed in EPFL’s School of Computer and Communications Sciences makes it possible to measure the dimensions of a room using just a few microphones and a snap of your fingers. Many promising applications are on the horizon.

Applications in forensic science are also on the horizon: based on several recordings of the same setup, audio waves could yield information on elements in the room that cannot be seen. In the same vein, analyzing a telephone call from a person who is moving around a room could allow investigators to identify where the person is calling from.


DAvid June 19, 2013 10:01 AM

@Larry, @Anon, etc;

(Joking, but…) Maybe they need all that data to store a giant hash table for different encryption schemes. I mean, they pre-compute everything, then just look up what is being said when they get the SSL data.

Slightly more seriously, I think that the size is partially a function of the military industrial complex and the cost plus approach to budgeting, plus the typical nerd approach of what to do when asked what you could do with a crazy hardware setup.

“Oh, that would be awesome, we could use that for all sorts of stuff… but we’d need…” [back of envelope calculation: 5 zettabytes of storage space to solve chess using the algorithm I thought of…] “hmmm… to be able to store all of the backbone internet data we intercept for later retrieval and analysis, we need about a zettabyte of data over the next 20 years, but we should anticipate growth in data volumes. Let’s be safe and say 5 zettabytes…” [Perfect!]

chris June 19, 2013 10:07 AM

Moving around large amounts of data isn’t that large of a problem– A Fedex plane full of TB drives has very high bandwidth. It’s not that unusual to use Fedex or some other standard shipping to move large amounts of data from remote locations where the cabled bandwidth is relatively small.

Jim Reardon June 19, 2013 11:26 AM

Bruce, I’m surprised that more people have not come to the realization that blanket recording of voice traffic is entirely feasible. Back in 1983, Bamford described this practice in connection with international calls in his book, “The Puzzle Palace” (an attempt to chronicle the NSA since World War II).

If you consider the architecture of the telephone network, there are high-capacity locations through which telephone traffic must pass, gateways between network types (e.g., VOIP and mobile telephones), etc. The gateway locations aren’t numerous by comparison to the number of telephones.

Then consider the use pattern of telephones. It isn’t necessary to record anything unless a call is in progress. At the key locations, it takes only a small amount of machine intelligence to identify and record the portion of the capacity that is in use, generally a fraction of the capacity of the facility. Recording the digital representations of calls at these points is trivial. Bandwidth ranges from 64 kbits/sec down to about 9.5 kbits/sec per active call, depending on the type of network. At these modest rates, further compression is unnecessary.

At the same gateways, the simultaneous recording of the signaling data that controls controls the telephone network is sufficient to all later location and retrieval of a specific all among all those recorded.

In terms of the traditional telephone network, the basic trunking unit is still a “PRI”, (or T1) line. In North America, this represents 1.472 mbits/sec of voice traffic and 64 kbits/sec of intermittent signaling. As trunking is aggregated up to fiber, the effect is to combine hundreds, or even thousands of these PRIs. One thousand PRIs on fiber still represents only 1472 mbits/sec, or 184 megabytes/sec — roughly the speed of single modern disk drive. In terms of volume, this flow is 0.184 GB per second, or 662 GB per hour for simultaneous recording of 23,000 active phone calls.

It should be apparent that numerous optimizations can be applied during recording to reduce the volume of this recording, including voice-recognition, silence suppression, compression, and the like. One optimization is gratuitous, as more modern networks carry digital voice in more compressed formats, reducing the source volume by up to 80 percent. Application of traffic analysis to the recording choices is a final major optimization.

After recording, post traffic analysis provides the ability to retrieve and “playback” specific calls without ambiguity. Speech to text could be applied, but probably is not because of background noise, multiple languages, and voice differences that easily produce incorrect results.

Bamford asserted that NSA was doing all this before 1983 in the international domain. The figures just outlined above demonstrate that it is easier to do today. Like 1983, NSA recording is not regarded as a wiretap or listening. Only when traffic analysis targets a specific call does this practice reach the level that somebody might consider the need for a search warrant in order to replay the call.

Nick P June 19, 2013 12:46 PM

@ Jim Reardon

Good post and points. I’ve also been considering Bamford’s (and other earlier) work while making my guesses about the current state of affairs. The past can hint at the present or future. The NSA of the past was more careful about legal issues, had very limited computing/storage technology compared to today, mostly focused on foreign stuff, had few agreements (far as I can tell) with major service providers, and had less of a budget. Yet, they managed to vacuum up phone calls internationally with Echelon, subvert US/foreign crypto companies/products, and process tons of COMINT.

So, reverse the limiting traits I mention to reflect their current situation and PRISM is an expected result. Matter of fact, my hypothesis expected much more than PRISM and the Utah datacenter. They’ve been gradually expanding on their capabilities. I’m sure more will be coming.

Herman June 21, 2013 5:34 AM

Of course the NSA is storing everything. Decades ago, when the USSR shot down a Korean airliner, the conversations of the fighter pilots with their control tower were played on the news four days after. So even way back yoinks the NSA had the ability to store everything and search and retrieve it based on its contents. It is also quite obvious that they will be a wee little bit better at it nowadays.

Herman June 21, 2013 5:45 AM

What is more terrifying, is how much the NSA must paying the cell phone companies in roaming charges to get all the foreign call data…

twofish June 21, 2013 8:55 AM

Also getting the data and moving it around is not that hard.

People think of the internet as a “net” but really most international traffic moved through a small number of trans-oceanic fiber optic cables. You just tap into to the data stream at the landing sites….


twofish June 21, 2013 9:02 AM

It’s also the situation that a lot of the trans-pacific cables go through Hawaii. Why do you think Snowden was there?

The other thing is that the cables aren’t that big. They are only three inches wide.

Someone June 21, 2013 4:17 PM


There is one thing I am still missing from these discussions: why would storage constitute a “search” as in the 4th amendment? I would say the search takes place when the data is queried, not when it is stored.

Let’s look at the actual text of the 4th Amendment:

“The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.”

Let’s assume that there is no “search” executed unless and until a “targeted query” is run.

The issues remaining are:

  1. Do phone records and recorded conversations fall in the categories of “persons, houses, papers, and effects”?
  2. Is the data being seized?
  3. If so, is the seizure reasonable?
  4. Has a legal Warrant been issued?

My arguments:

  1. This is the most controversial issue. I would argue that they fall under the category of papers, because they are records which identify individuals and contain personal information which most people would prefer to keep private. Who would offer to make all of their phone conversations, ever, available to everyone? Not many.
  2. If the government claims the data is not being seized–because it is digital data and the original owner of the data is not deprived of it–then unlicensed copying or sharing of digital media is not theft. But the government says it is. Can’t have it both ways. (Oh, wait, they can, because they’re the government. cf. DoJ’s arguments to SCOTUS about Obamacare: “It’s not a tax,” “It’s a tax.”)

  3. I suppose whether it’s reasonable is a matter of opinion.

  4. This is the most straightforward issue, and this is why the program is clearly a violation of the 4th Amendment, and therefore unconstitutional, and therefore illegal:

i) No Warrant has been issued that meets the requirements.
ii) There is no probable cause.
iii) There has been no oath nor affirmation.
iv) The particular place to be searched has not been defined.
v) The persons or things to be seized have not been defined.

Some would claim that existing laws authorize this collecting, but such laws directly conflict with the requirements stated above by the 4th Amendment. Clearly, no blanket warrant nor authorization can be given to seize all data about all people. To claim otherwise is to argue over basic English comprehension or with definitions of basic concepts like what a warrant is, what “particular” means, and whether “the persons or things to be seized” could reasonably be defined as “everyone’s.” It would be like arguing about what the definition of “is” is…oh, wait, we already had a President who did that.

In summary: the 4th Amendment says it’s illegal. No law can change that other than an amendment of the Constitution. Therefore the program is illegal. Therefore those involved who have taken an oath to uphold and defend the Constitution have violated their oath.

SchneieronSecurityFan June 23, 2013 12:33 AM

@ Bruce Schneier “they’re doing speech-to-text as close to the source as possible and working with that.” Is there any truth to a supposed set of words that if one of these words was uttered during a telephone call, a recording of the call would start? This set of words is mentioned in the 1998 movie “Enemy of the State”.

@ pingu I think that your rates are more likely 9600 bps/ 9.6 kbps; not 9600 kbps. That’s still 4.32 MB/ hour.

When was the first 3.5″ 1TB hard drive? Around 2005? Now there are 3.5″ 4TB hard drives.

The hardest part for blanket recording – to me at least – would be the recording of local telephone calls. For example, I use my telephone land line to call my neighbor’s land line. My call goes to my telephone company’s local office 1-2 miles away and back to my neighbor’s house. The call involves an analog voltage traveling down a pair of copper wires to the telephone company’s switch in its local office and leaves as an analog voltage traveling down another pair of copper wires to my neighbor’s house. Recording that call would involve the cooperation of the telephone company, or the NSA hacked into the company’s equipment.

The problem becomes easier when different types of connections become involved. I call someone that has a different company local office so that my telephone call on copper wires would leave my local office on a fiber-optic trunk cable to get to that local office. Also, what if I call my neighbor and one or both of us has VoIP, fiber optic lines, cable telephony or a cell phone.

National or international long distance calls would be even easier to intercept as the following article about GCHQ and fiber optic cables shows.

Don’t forget text messages, multimedia messages and internet media which have always been digital. Also, these forms of communication have led to a significant drop in the elapsed time for a telephone call over the last ten years.

Clive Robinson June 23, 2013 5:20 AM

@ SchneieronSecurityfan,

You don’t say where you live so I can’t say true/false about your comment,

    The hardest part for blanket recording – to me at least – would be the recording of local telephone calls. For example, I use my telephone land ine to call my neighbor’s land line….

Whilst what you say is true for much older POTS exchanges in many places it’s not been true for over ten years and getting onwards of twenty in quite a few others. In fact in parts of the UK it’s not been true for fourty years.

Basicaly many telephone service suppliers are switching over to IP as the carrier network and using the cost savings it gives, it’s one of the reasons that the US senator was getting so steamed up by two Chinese telco equipment suppliers.

What happens in most places is your legacy copper pair comes into a local Central Office (CO) frame room where once they would have been put onto electromechanical Strouger or cross bar kit. Since the early 1960’s it was realised that this needed to change and projects like System X started to digitise calls in all but the “last mile”. So for most people paying for a phone service their copper pair ends up going to some kind of line terminating card. Often these days it allows all sorts of out of band communications such as DSL in it’s variety of forms.

The other side of that card is in effect a very very high bandwidth local area network using one or more digital protocols. It connects to other cards that route the call out to a network that connects the frames together and because of this and the need for increased reliability it has become in effect a wide area network where calls are routed by whatever routes are available.

In the UK for instance it was not unusual (and still isn’t) for a “local rate” call from one house in a street to another house in the same street to be routed up through major exchanges along with what were once called “National Rate” or “long distance” calls. In fact in the start of the 1990’s this was used as evidence for calls to politicians to have the whole UK on a single local rate tarriff. Since the advent of other networks both real and virtual most calls these days in the UK realy are not in anyway local any more and this is true for many parts of the world.

We know in the UK that GCHQ had been spying on the radio “trunk” in various parts of the UK (esspecialy that involving N.I. and Eire) when the UK Government’s Property Services Agency (which used to be on the fourth floor of the low block of Tolworth Tower in Surrey) started selling of what looked externaly like old “grain silos”. However inside they were clearly not grain silos and had numerous well shielded floors with plinths for holding numerous 19 inch equipment racks. The exception was the top floor which was anything but shielded, inside it was clear that the construction was almost identical to the “radar domes” common around microwave transmitters and receivers. Further the engineering drawings showed the plinths on this floor had been “baring alined” with a four or five digit accuracy towards various civil and military microwave communications towers often at very very considerable but still line of site distance. These silos got sold off some small time after most of these radio trunks were decommisioned and had been replaced by cheaper to run/maintain with much higher capacity optical fiber networks. Likewise some co-loc sites for 432MHz “MOULD” radio systems which still blights much of the UK Amature Radio 70cms band in a 100Km radius from Charing Cross (which got considerably worse during the olympics when the UK Gov arbitarily took more amature radio spectrum for “security” and “links”, which it’s not supposed to do by International Agreement).

DavidTC June 24, 2013 9:44 PM

Exactly. A lot of people seem to forget this is the NSA doing this.

The NSA has a hell of a lot of computer resources. And a lot of very intelligent people.

Storing all phone calls, even without any tech advances, is certainly possible, and that’s not assuming that they don’t have some sort of better compression codec. (There are a lot of very smart people at the NSA, and while they’re fairly open about encryption, that doesn’t mean they don’t have some uber-voice compression thing they’re sitting on that is twice as good as anything civilians have. They have to share encryption to make sure they haven’t made a stupid math mistake that allows others to crack it…they’re under no such requirement for compression algorithms. If it works, it works.)

It currently seems impossible that they could be doing voice recognition on all phone calls, and I suspect they aren’t, even with all their computing power.

But that doesn’t mean that they aren’t running them through voice recognition as fast as possible, starting with the most suspicious-looking and just sorta running statistically decoding at other times, randomly selecting ones to parse. (And, again, this is assuming they don’t have better programs for voice recognition either.)

The NSA are professional, for lack of a better term, ‘math coders’. They understand how to make computers find patterns and manipulate data better than anyone on the planet, and they have the tools to handle massive amounts of data.

Let’s not assume they can’t do something because it seems like too much data and CPU cycles.

truthteller June 25, 2013 12:19 AM

dont underestimate the NSA. look at what Darpa have been able to create. You think indepeandent researchers and private sector could have indepandently created something like the internet or GPS? Also dont forget that NSA can outsource their projects to research universities, when they have a difficult to solve problem,they outsource it to MIT or Caltech, and let their researchers solve it,of course they will never be told that their research results will end up in NSA projects,research universities have become a vital part of the military industrial complex.

SchneieronSecurityFan July 1, 2013 12:12 AM

@ Clive Robinson, I’m almost sure Verizon territory in Florida doesn’t route calls through another service area and back down again (Verizon keeps the toll money) unless its FIOS (maybe).
Here, there are 2 physical networks: copper pairs for POTS and fiber optic (FIOS). These networks have separate neighborhood central offices. All of the trunk cables are fiber optic.
In the UK, I suppose tapping internet traffic allows the tapping of a voice call.

o July 15, 2013 9:25 AM

There is an open-source lossy voice compression codec called Codec2 which can compress down to 1200 bits per second. Such a codec could be used to archive large quantities of voice traffic in bulk.

_Jim July 15, 2013 8:54 PM

Many, many little errors above in way too many posts to address; apparently nobody posting on the board has ever had an actual telecom job (e.g. switch tech or traffic engineer in a normal CO) nor worked with engineering a cellular system.

I can say with confidence and from personal knowledge that the so-called non-wireline or “B” cellular side in the DFW MSA had no, repeat: not one, again: zero, zip, nada trunks to the NSA or any other coding or encoding facility. Our several Ericsson AXE 212-based switch based MTSOs comprising our Ericsson RBS882/RBS884 mobile system had only trunks for PSTN interface and of course the trunks dedicated to backhauling cellsites voice/control to the MTSOs … this was as of 1994 … specific provisions for ‘quality monitoring’ (and eventual CALEA compliance) of individual switched-circuits notwithstanding through ‘sampling’ of the Time-Space-Time (switch) matrix …

PS SOMEONE has got to/had to pay for all (any) ‘backhaul’ of voice circuits too; it doesn’t come ‘for free’ to anybody …

_Jim July 15, 2013 10:29 PM

truthteller • June 25, 2013 12:19 AM
“dont underestimate the NSA. look at what Darpa have been able to create. You think indepeandent researchers and private sector could have indepandently created something like the internet or GPS”

Broad appeal to authority; shows little knowledge or insight where these technologies originated or who perfected their final, present-day implementation (PS. It wasn’t government, although they paid some of the ‘freight’ costs early on) .

_Jim July 15, 2013 10:42 PM

Clive Robinson • June 23, 2013 5:20 AM
We know in the UK that GCHQ had been spying on the radio “trunk” in various parts of the UK (esspecialy that involving N.I. and Eire) when the UK Government’s Property Services Agency (which used to be on the fourth floor of the low block of Tolworth Tower in Surrey) started selling of what looked externaly like old “grain silos” ”

Highly doubtful and supported from several different perspectives, including the ‘fact’ that the higher frequency links (e.g. 11 GHz as was used here by Bell on links) make for a VERY narrow beamwidth!

Effectively, your intercepting ‘silo’ would have to be a ‘block’ (or interferer) in the path, and son, ‘radio path engineering’ would not accept something sticking up or planned to be sticking up like that in their line-of-sight radio path!

See: Fresnel Zone Clearance and Antenna Height Calculator below

FAR easier to attach to the digital trunk coming off the end of either end rather then attempt to ‘sniff’ an iffy signal owed to an errant (but weak and unpredictable as the frequency used changes!) ‘sidelobe’ off one of the transmit antennas.

Clive Robinson July 16, 2013 2:16 AM

@ Jim,

I think you are not really a communications engineer or anyone with any practical experiance using comms links over long distances.

You make the following statment,

    Highly doubtful and supported from several different perspectives including the ‘fact’ that the higher frequency links make for a VERY narrow beamwidth

But you don’t say what you mean by “VERY narrow” nor do you mention the effects of “over illuminating” a dish or the other issues of all antenna types that create side lobes.

Simple maths taught to most children before they become teenagers would tell you just how impossibly narrow a perfect beam would have to be to only illuminate an antenna with a managable physical size to be mounted on a lattice mast or similar construction used to give a one hundred mile link.

As for your,

    Effectively, your intercepting ‘silo’ would have to be a ‘block’ (or interferer) in the path, and son, ‘radio path engineering’ would not accept something sticking up or planned to be sticking up like that in their line-of-sight radio path

As I’ve indicated it does not need to be “in their line-of-sight” (more correctly boresight) and if using a side lobe can be well off to one side.

As for fresnel zones they are usually only used when dealing with marginal paths or when you are trying to diffract a signal around a knife edge such as a ridge. Importantly whilst objects in even zones can detract from the signal those in the odd zones can enhance the signal, such is the nature of phase shifts in signals at a receiving antenna. However as many people will know from their car radios and “flutter” or “ghosting” on their TV picture phase related problems are generaly only seen when the main path signal is sufficiently marginal compared to the reflected out of phase signal.

I could go on in great length about the charecteristics of antenna and mechanical structures they are mounted on (which move in the wind etc). Likewise I could relate a number of fairly wierd sounding stories about how EmSec has been compromised by conductive or dielectric structures causing lensing effects but this is not my blog and it would be quite far off it’s normal topics.

Leave a comment


Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.