James Bamford on the NSA

James Bamford—author of The Shadow Factory: The NSA from 9/11 to the Eavesdropping on America writes about the NSA’s new data center in Utah as he reviews another book: The Secret Sentry: The Untold History of the National Security Agency:

Just how much information will be stored in these windowless cybertemples? A clue comes from a recent report prepared by the MITRE Corporation, a Pentagon think tank. “As the sensors associated with the various surveillance missions improve,” says the report, referring to a variety of technical collection methods, “the data volumes are increasing with a projection that sensor data volume could potentially increase to the level of Yottabytes (1024 Bytes) by 2015.” Roughly equal to about a septillion (1,000,000,000,000,000,000,000,000) pages of text, numbers beyond Yottabytes haven’t yet been named. Once vacuumed up and stored in these near-infinite “libraries,” the data are then analyzed by powerful infoweapons, supercomputers running complex algorithmic programs, to determine who among us may be—or may one day become—a terrorist.

[…]

Aid concludes that the biggest problem facing the agency is not the fact that it’s drowning in untranslated, indecipherable, and mostly unusable data, problems that the troubled new modernization plan, Turbulence, is supposed to eventually fix. “These problems may, in fact, be the tip of the iceberg,” he writes. Instead, what the agency needs most, Aid says, is more power. But the type of power to which he is referring is the kind that comes from electrical substations, not statutes. “As strange as it may sound,” he writes, “one of the most urgent problems facing NSA is a severe shortage of electrical power.” With supercomputers measured by the acre and estimated $70 million annual electricity bills for its headquarters, the agency has begun browning out, which is the reason for locating its new data centers in Utah and Texas.

Of course, that yottabyte number is hyperbole. The problem with all of that data is that there’s no time to process it. Think of it as trying to drink from a fire hose. The NSA has to make lightning-fast real-time decisions about what to save for later analysis. And there’s not a lot of time for later analysis; more data is coming constantly at the same fire-hose rate.

Bamford’s entire article is worth reading. He summarizes some of the things he talks about in his book: the inability of the NSA to predict national security threats (9/11 being one such failure) and the manipulation of intelligence data for political purposes.

Posted on October 22, 2009 at 6:10 AM44 Comments

Comments

Carl Bussjaeger October 22, 2009 6:26 AM

It’s a crude analogy, but when panning for gold you don’t run a spectrographic analysis on every particle in your pan to determine which to save for later. They need to learn to sift, rather than analyze, which would probably be a little more privacy friendly as well.

Now, one might argue that they need to analyze in early days in order to learn what to sift for later. I figure if they don’t know what “terrorist-gold” looks like by now they probably shouldn’t be in the business of looking for it.

J. Brad Hicks October 22, 2009 7:31 AM

When it comes to surveillance states, nobody will ever improve on the East German Stasi, and they got to this problem first. They didn’t do it with modern electronics, of course; “all” they did was hire about half of the East German adult population to spy on the other half. This actually made their spying less effective, made it easier for criminal gangs to operate and for defectors to escape and fringe political movements to organize, not harder: the signals they were looking for were drowning in an ocean of noise.

The ultimate limit on the power of any domestic spying agency is one that no amount of computer power can overcome: the people who run these agencies only trust the opinions and judgment and secrets-keeping of at most a couple of hundred or so people, maybe a thousand tops. And that 1,000 or fewer people only work 8 hours or so per day, 50 or so weeks per year. Anything they don’t have time to look at, for long enough to understand what they’re seeing, might as well never have been recorded or written down.

Piper October 22, 2009 7:54 AM

Did he refer to computers as “infoweapons”? Did he actually say that? I hope that was just a little joke or something.

phil October 22, 2009 8:24 AM

Given that today’s disk technology has an expected unrecoverable error rate of one out of every 10^14 bytes, if the NSA expects to store 10^24 bytes, they should expect to see 10^10 read errors.

I think whoever made that projection was smoking something illegal.

George H October 22, 2009 8:28 AM

@Mike Hendon

Sun’s ZFS uses 128-bit addressing which is considerbly larger than a zettabyte (by 10^17). So to power a zettabyte system would require “only” about 5% of than current world wide energy consumption.

Nostromo October 22, 2009 9:02 AM

Bamford’s earlier book about the NSA, “Body of Secrets”, is also well worth reading.

Clive Robinson October 22, 2009 9:14 AM

It’s not just the NSA with power problems.

For those that care to think about it energy is the only renewable resource on the planet (it virtualy all comes from the sun). Untill we develop low cost space technology to mine other selestial objects the rest of the worlds resources are finite, and need to be recycled either by man or nature.

The only other resource that is not limited by recycling is “money”. And with a little further thought you will realise that energy and money are directly equatable.

So…

History has taught us that the control of resources especialy water is a way of excercising political control on others.

The new game in town of course is power and we already see the Russians using the barrel of gas and oil pipes to excert political control on the old CCCP satelite states that have broken away.

It could be argued (quite easily) that the desire to control nuclear technology by the US is a stratagie to control energy supplies in future years.

Likewise the current “race for the moon” by the likes of China is not about political showmanship it is about the very real possability that the moon may have the fuel for fusion power in sufficiently easily available quantities to make the cost of the new space race easily justifiable.

To re-enforce this view there is of course the question of the unsigned international moon treaty, which aims to make the moon like the south pole.

However many nations regret signing the “hands of / science only” treaty that current protects the south pole from national explotation of it’s energy and mineral reserves. Which is one of the reasons sited as to why the moon treaty has gone nowher.

The new game in town is not “information” but good old “power” in the form of energy.

Oh by the way the ultimate form of polution because you cannot recycle it is heat…

Jeff Ruff October 22, 2009 10:18 AM

I think the ZFS calculations made a mistake converting to the power estimates and putting it in kilowatts. It should have been 3.2E32 joules, not E38. That makes a huge difference and only requires .0000003 % of the world’s oceans. Still ridiculous, but theoretically more “possible”.

Rich October 22, 2009 10:29 AM

If power is an issue, why move to Utah and Texas? Because of cooling demands on power, wouldn’t Alaska make more sense? (or, at least, a cooler part of the lower 48).

HJohn October 22, 2009 10:33 AM

We also have to look at the information in terms of how things will likely be in the future. A couple decades ago, back when a KB was considered a lot of memory, no one would have dreamed of using a terabyte, which probably would have required a huge facility. Yet, just today, I rotated one of my TB backup drives to an offsite location. Small too.

Before long, we’ll be dealing in Petabyte, then perhaps Exabyte. Zettabyte or Yottabyte may not be in our lifetimes, but people before us never dreamed GB, much less TB would be.

My point is just because something is too much information to process today doesn’t mean the technology won’t be here in our lifetimes to do so.

Brandioch Conner October 22, 2009 10:56 AM

@HJohn
“My point is just because something is too much information to process today doesn’t mean the technology won’t be here in our lifetimes to do so.”

The problem is that the data collected is only relevant for a limited time.

If the data cannot be turned into actionable information in that time it is only useful for tracing the steps AFTER something has happened.

Example: you have the data on where Osama bin Laden will be next Tuesday. But you won’t be able to process that information for the next 10 years.

HJohn October 22, 2009 11:01 AM

@Brandioch Conner: The problem is that the data collected is only relevant for a limited time.

If the data cannot be turned into actionable information in that time it is only useful for tracing the steps AFTER something has happened.

Example: you have the data on where Osama bin Laden will be next Tuesday. But you won’t be able to process that information for the next 10 years.


Oh, I fully agree. No debate here.

Yet, that is my entire point, though I wasn’t clear enough. It can’t be used for what would be a relevant reason today. But it may be used for much different reasons in 20 years. Why? We don’t know, but i’m guessing it won’t be a pleasant use.

Which is why they should not collect it in the first place. I’m not scared of what they’ll do with it today, I’m scared of what they can do with this ocean of data in the future, and the day is coming when they can use it with ease.

BW October 22, 2009 11:18 AM

The problem isn’t storing that much data, it isn’t even processing that much data, the real problem is IO bandwidth. Sure you can chain multiple CPU racks together, but the hardware handling the underly IO doesn’t scale as nicely.

It’s much more likely they are running multiple supercomputers with their own dedicated tape silos. Tape may be slow but it doesn’t require the boiling of the worlds oceans. Anyway, you’d be crazy to use a single filesystem of that size.

Yobi Gear October 22, 2009 11:50 AM

He just wanted to use the word, “yottabyte” in a sentence! I’m more of a yobibyte fan myself.

Bryan Feir October 22, 2009 11:55 AM

@Brandioch Conner:
Reminds me of the story of the weather prediction system run as a test on a supercomputer, I believe belonging to the NOAA.

With data from weather networks all over fed into it, it was able to make weather predictions a week in advance with something on the order of 80% accuracy.

Problem was, it took two weeks to run the analysis, on the best supercomputers they had at the time.

Me October 22, 2009 12:00 PM

One problem of their power calculation is that they implicitly assume that ALL of that storage farm has to run at the same time. But if they power in little segments that can be turned on and off, that part of the equation might be reduced considerably.

John Campbell October 22, 2009 12:18 PM

One problem with efforts to simplify– sift– the datastream(s) is that the USA includes polymemetic populations (multiple ethnicities, creeds and other preferences) so any bell curve (or, after an Arkansas judge broke it up, 7 baby-bell curves) is gonna be lumpy.

Now if people were somehow squeezed down into a monoculture to make it easier to detect “unapproved dissent”…

Brandioch Conner October 22, 2009 1:42 PM

@HJohn
“But it may be used for much different reasons in 20 years. Why? We don’t know, but i’m guessing it won’t be a pleasant use.”

Why wait 20 years? Suppose that you are going through a messy divorce today and the other person has a family member who has access to that data.

We already have instances of people using such GOVERNMENTAL systems to track people for PERSONAL reasons (girlfriend / wife / the husband of your girlfriend / etc).

One of the problems (there are many) with this is that it is impossible to convince the average person that it will be abused when there is the slightest chance that it will help catch a terrorist. Even when the abuse can be shown to exist.

HJohn October 22, 2009 1:52 PM

@Brandioch

20 was just a hypothetical number. I was referring to the fact that they are collecting more data than they have the power to process–yet. But they will have the power some day.

Of course they can take individual pieces of data and use it against a person now. Some day, sooner or later, they may have the processing power to do this against the massess automatically. Scary thought.

The personal threat today, as well as the future’s mass threat, is all the more reason to not have the data.

Shane October 22, 2009 2:21 PM

@Carl Bussjaeger

“if they don’t know what ‘terrorist-gold’ looks like by now they probably shouldn’t be in the business of looking for it”

Abso-$@%#ing-lutely. Kudos.

Sadly, I’m sure the when the real talk is had behind the scenes, ‘terrorist-gold’ is really the last thing they are looking for. Presumably, like Leprechaun gold, it only exists at the end of a rainbow, but it sure makes a good excuse to go treasure hunting without leaving the comfort of their desks in the state secrets citadel. After all, any normal patriotic person shouldn’t have anything to hide, right?

sigh

Blueeyed.1978 October 22, 2009 2:32 PM

 First off, I don't trust an outsider to tell me about the NSA (since I use to work there) and what he thinks is the truth.

Second of all, before 9/11, the NSA’s job was aimed at Cyber Defense and Signals Analysis.
So his comment: “the inability of the NSA to predict national security threats (9/11 being one such failure) and the manipulation of intelligence data for political purposes.”

 This is completely stupid!  That is (and has been) the job of the CIA and FBI.  He needs to look at placing blame elsewhere.

Andrew October 22, 2009 4:47 PM

People with an axe to grind are highly motivated and often dig up interesting tidbits — but their very fanaticism makes their broader analyses suspect. I have always felt that a highly competent intelligence agency would carefully cultivate a public reputation for relative incompetence.

averros October 23, 2009 1:20 AM

What NSA needs is a big pink slip.

The real bad guys learned to use end-to-end encryption. At this point NSA is useless – all they do is spying on benign communications, while being completely and utterly helpless to deal with any half-clued terrist.

John Rath October 23, 2009 1:38 AM

I thought the following followed yottabyte:

Xonabyte
Wekabyte
yundabyte
udabyte
tredabyte
sortabyte
rintabyte
quexabyte
peptabyte
ochabyte
nenabyte
mingabyte
lumabyte

d. October 23, 2009 3:15 AM

@averros: The real bad guys learned to use end-to-end encryption.

Google for social graph, traffic analysis.

Think why communication data retention for all telecommunication was so fast implemented in EU. (for no shorter than 6 months and no longer than 2 years if I remember correctly)

No need to record all payload. Only metedata: who to whom, when, how much information.

There are rumors that snail mail evelope sorting machines store this information too. Why not?

averros October 23, 2009 4:24 AM

social graph, traffic analysis.

Bad guys learned to use onion routing, too. And Internet cafes and free wireless spots. And prepaid cellphones. Not to mention old-fashioned drop boxes and word-of-the-mouth.

It’s not exactly rocket science.

All NSA catches is terrist wannabes who would’ve failed recruitment screening anyway due to their terminal stupidity.

Neighborcat October 23, 2009 6:10 AM

Yottabytes, Yogibytes (Hey-hey-Boo-Boo!), sifting sorting sniffing, the best minds and the algorithms of their dreams: it all comes to naught, because even the most prescient and accurate intelligence gets funneled through the same clogged spigot as ever, politics, where only the agreeable information gets through.

The US intelligence community, using little more than old-fashioned wet-ware and “hinky sensors”, warned of the 9/11 attack, but politics either filtered the warning out, or if you are of a more cynical mind, welcomed the attack as support for their pre-existing plans to invade Iraq…

So I’m not at all concerned about the yogibytes of data or future processing capabilities, it all dances to the tune politics plays. The repressive regimes of history managed just fine without computers.

NC

Neighborcat October 23, 2009 7:21 AM

After years of intense labor, the most comprehensive data collection and analysis computing system is finally complete. The switch is flipped. The most accurate and up-to-the-minute data on the comings and goings of the American population courses through the processors. The lights dim briefly in the desert southwest as the algorithms parse and filter the information. The results are displayed, and Lt. Gen. Keith Alexander frowns at the words.

Months later, the system has been checked, the algorithms tweaked, the country’s top data analysis experts hired and fired in succession, and the analysis to identify the individuals who present the greatest threat to the freedom and safety of the country is run once more. Lt. Gen. Alexanders frown is now a collection of permanent creases. His hair is visibly thinner and grayer. Despite his best efforts, the system continues to return the same list of names, and even their addresses:

The members of Congress, the Senate, the Supreme Court, the President and his cabinet, various government directors and advisers, and at the bottom of the list, one Lt. Gen. Keith Alexander.

Mark R October 23, 2009 7:35 AM

I know how this will end up – like my basement. I keep boxloads of stuff that I have no reasonable place to put at the moment, because I might need them later. I don’t even know what’s down there. I often buy things only to find the same item months later in one of these boxes. Eventually I get frustrated enough that I throw everything out.

To me, it’s easy to dismiss this thing as a data center “size contest”. Throw in lucrative contracts and naive leadership that expects systems to do their thinking for them, and you’ve got all the ingredients for a tax dollar vacuum.

Clive Robinson October 23, 2009 8:26 AM

@ Neighborcat,

“So I’m not at all concerned about the yogibytes of data or future processing capabilities, it all dances to the tune politics plays.”

And it is that that ruley scares me.

The thinking here appears to be “there is to much data” and “it would take to long” however that is the incorect way to view it.

For “political” ends the politicos already know the name of the target they wish to attack. Searching the DB for all information on a single person would (if indexed correctly) be a very trivial search taking considerably more time to output than it would to find.

In the UK, a current Government Minister (Jack Straw) when a junior in the Harold Wilson era actualy used various National DBs to search for “dirt” on political rivals.

In the Thatcher era the Police and MI5 routienly collected information on political activists and supplied it to comercial organisations. They also used similar National DB data and data from comercial DBs to build profiles on Trade Union leaders.

During Tony Blairs tenure activists of all kinds where investigated and their details stored in DB’s including those who signed petitions against infrastructure expansion the gulf war, those who wrote to their local councils complaining about things such as local “initiatives” that where actually “central Gov” activities equivalent to “Gerrymandering”.

In more recent times it has been shown that the Police activly record as many people as possible who attend demonstrations and activly seak to identify them and then pass on all the details they can to comercial organisations so that dirty tricks campaigns can be run against them.

Oh and a senior judge decided just the other day that the police can keep things like your DNA proffile on record or any other piece of information for that matter as long as the police think they might have a use for it.

People who commited very very minnor indiscretions when very young are now discovering that they cannot get jobs because they have a “Police record”.

Oh and there is a story that was doing the rounds that an insurance company had refused to pay out on an insurance claim because somebody who had a very minor conviction that was legaly “spent” many many years previously had not put it down on their insurance form.

Oh and add to this various UK minor politicians have abused the system in various ways to prevent political oposition, including making false or misleading accusations to various “standards” organisations who then suspend the person the complaint is made against untill a protracted investigation is compleated.

One such case was where a political oponent was accused of “swearing under his breath” and suspended when a council leader anounced their would be no investigation of himself for gross mismanagment as his cronnies had held a “closed meeting” and decided it was not an issue. Apparently the person accused of swearing had said “I don’t bl***y belive it”. The legal cost of the suspension would have been around 400,000USD equivalent at the time and was paid for (without choice) by the local tax payer…

With this sort of behaviour going on I don’t want any data on me being held by the UK Gov, espetialy under RIPA just about any neredowell can get access to it and even when shown to be incorrect the person has no right to have it corrected…

mokum von Amsterdam October 23, 2009 9:03 AM

Clive Robinson got it right.
There is no need nor desire to search & correlate all data. The only reason to store such incredible volume is for specific use against ‘our enemies’. Who ever fits that title will be asked to explain & contextualize all stored minformation and by failing to do so incriminate him|her self.
It’s basically an endless source of incriminating evidence ready to be used by the people that have access to it. Nothing new here, please move on.

D October 23, 2009 1:27 PM

Calling the yottabyte number hyperbole is generous. I would label it a Dammed Statistic.

Just to provide a bit of context, WolframAlpha reports the mass of the Earth as 5.9×10^24 kg, i.e. almost 6 yottakg. So, pick a memory technology – now using that technology what fraction of the Earth’s mass would be required to construct devices to store a yottabyte using that technology?

PackagedBlue October 25, 2009 6:08 AM

Hey, everybody is doing social media, to the NSA, the internet is all a social media event that they want to traffic cop, perhaps not a bad idea if done by a humanist skynet system.

Hopefully, the NSA would use its powers to shape persuasion for the better.

If only real leadership could have persuaded many about the failed policies of recent years, and what has been building for many years, before 2000. Frusterating!

A book that I find refreshing, compared too many whitewash books of today, is Running the World, by David J. Rothkopf, copyright 2005, a PublicAffairs book.

greg October 25, 2009 6:17 AM

@D
If i can have some memory device that records information in the chemical state of a pair of atoms (bound and unbound) then 2 moles of atoms is enough to store 6e23 bits of information. Assume we use carbon, thats 6e23 bits per 24 grams. So 1e24 bytes is just 80x that with some margin. Or just under 2kgs of carbon.

Well that is a “perfect” system where we assume you can extract and change the information all externally. Probably not practically possible. But if we are into theoretical bounds then its in the realm of plausibility. And we haven’t considered using many states per atom, which is theoretically possible too.

Or consider how much volume 8e24 cubes of 1nm on each side takes (8e-3 m3 or 8 liters). Consider that we already have chips with features measured close to that small (gate insulation layers are ~2nm IIRC).

Now consider how good the computers were when Clive Robinson got his first job. 😉

50 years I can’t say we won’t be capable of storing 1e24 bytes…

bob October 26, 2009 7:50 AM

I keep wondering how long it will take these “megavcuum” [ie suck up all available input but never produce meaningful output therefrom] government organizations to realize that MORE input is not BETTER but rather the opposite. They wont be able to glean anything useful from it, but after (the next catastrophe) they (and more importantly – the mass media also) WILL be able to find pointers in their archived data to (whatever it was that happened). Then they will have to CYA-spin why they didnt pick that one grain of sand off the beach and prevent (that particular catastrophe) from happening.

DC October 26, 2009 2:38 PM

To address the issue of we get data that shows where someone will be next week, but it takes longer to decode….

First, let’s pick a more reasonable number for something important — like say a month to decode.

Ok, it’s useless, but wait – it’s not. This has been gone over in tactical security quite a lot, which tends to be less long term than strategic, say figher jet comms during a battle — once the battle is over, do you care?

Yes, you do. There might be a next battle, and now you know how they think, how the commands go down, and next time mere traffic analysis means more than it did before.

I do some work with neural networks.
Lets suppose that along with that data you couldn’t decode right off, you got other data that you could decode, but just didn’t know if it was relevant.

Now you can find out if it was relevant, and if it was, use that easier to get data as a correlated predictor instead. You need a wide net for that to be possible — there might be a few haystacks to search to find that easy to get (but hard to know it’s worth anything) data that does correlate.

For example, and important figure does certain things before a move to set up the new spot, and ensure it will be safe for the intended stay there. Once you know even past data about movements, you can then look for the signs of the next move…

That all said, I dunno if anyone there (NSA) now is that smart AND in control of things, and I sure hate the privacy loss issues as bad as anyone here.

For example, since it’s a fairly foregone conclusion that most of our elected officials are corrupt, this hands power to various unelected officials via blackmail, which I consider to be even worse.

Ever wonder why the elected officials give DHS everything they ask for, pronto? Maybe it’s too late to worry about that one.

Clive Robinson October 26, 2009 6:57 PM

@ DC,

“There might be a next battle, and now you know how they think, how the commands go down, and next time mere traffic analysis means more than it did before.”

Yes, and isn’t obvious once somebody says it 😉

And I think it’s the first time I’ve seen somebody state it explicitly on the web.

However if you read the book “Secret War” by Prof R.V.Jones written in the 70’s you will find oblique refrences to this when he was tracking down Watchel’s “technical signals regiment” who supported the V1 rocket plane development.

Oh and it is a closed loop in that the improved info from traffic analysis aids in better code breaking, which further aids traffic analysis….

Oh and also remember traffic analysis is not just about flows of information it also involves “fingerprinting” transmitters by their charecteristics.

The same applies to modern day network traffic in terms of looking at time stamps and other indicators that allow you to enumarate a computer even if it changes it’s IP address etc.

When you are trying to “burn an agent” (or their system) every little piece of kindeling no matter how small adds to the flames and so turns up the heat.

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.