Automatic Document Declassification

DARPA is looking for something that can automatically declassify documents:

I’ll be honest: I’m not exactly sure what kind of technological solution you can build to facilitate declassification. From the way the challenge is structured, it sounds like a semantic-search problem: Plug in keywords that help you comb through deserts of stored information in the bowels of the Pentagon and the intelligence community, and figure out whether the results of the fishing expedition can be tossed out from the depths onto dry land in accordance with declassification policies. But that’s a matter of building an algorithm, something that might be too, well, quotidian for Darpa.

Posted on September 17, 2010 at 10:15 AM13 Comments

Comments

Thomas Zehetbauer September 17, 2010 10:38 AM

I guess that DARPA, NSA, CIA,… already have something to crawl the internet for classified information. using that system for documents to be de-classified could provide important hints to contained information which is not to be de-classified. if you put that system after a human reviewing the document, you could mis-name it automatic declassification.

D September 17, 2010 10:46 AM

Automatic declassification can’t work on non-digital documents. When I think of materials that might be declassified due to the documents’ age, I think about secure filing cabinets with reems of paper stamped ‘Top Secret’ waiting to be rediscovered from within the bowels of the agency entrusted with their secrecy. If we’re tapping about applying his for future digital systems, it might make sense. In that case it seems easy. Kind of like a deadman’s switch. “X will become declassified on D unless you have a good reason otherwise.” NSA and CIA won’t go for this.

DevilsAdvocate September 17, 2010 11:00 AM

Devils Advocate view. More classified useless junk, the better. Hides the rare, important stuff. Who cares, anyway. There’s never going to be one hundred percent public oversight, unless there is like no war and crime.

Christian September 17, 2010 11:01 AM

Just wait until your cryptographic primitives get broken.

Thats automatic declassification as all older documents become readable like this over the time.

e.g. anything encrypted with DES could be considered declassified nowadays.

Tony September 17, 2010 11:30 AM

Won’t help with the existing pile of classified stuff – but perhaps they could require that any new classified documents be tagged with the criteria for declassification. As well a a simple “after x years” it could include events like “after death of person X”, or “when specifications for device Y are published (or leaked)”.

This might help reduce the pile too … right now documents are simply marked as classified because it is easy just to mark everything that way. Making people justify why a document needs to be classified would help enormously.

Stephen Crowley September 17, 2010 11:51 AM

Hey, sounds like a great idea to me. I think factoring numbers has something to do with relativity and time travel, so, you could say that the spooks are in a “race against time” and by participating in the race, actually affect the flow of time for all those they monitor and afflict. I’d like to believe this ‘feature’ can be used to cure our species ills some day.

b September 17, 2010 12:45 PM

I think this is a combination time saver and plausible denial creator. For the first part, you have the program run, it blacks out a fair amount, then you have an actual person go over the parts that weren’t blacked out looking for info that still needs to stay classified.

Then, if it should ever come to light that certain parts that shouldn’t have remained blacked out are were blacked out, well, you know, these programs aren’t perfect, it’s not our fault…

tab September 18, 2010 9:24 AM

@tony:
[but perhaps they could require that any new classified documents be tagged with the criteria for declassification. As well a a simple “after x years” it could include events like “after death of person X”, or “when specifications for device Y are published (or leaked)”.]
Actually, the document should be tagged with the criteria for CLASSIFICATION, not declassification.
Currently, the default is to classify a document (CYA), with a justification attached for declassification. This leads to huge amounts of material that is classified for no good reason (including material that is ALREADY in the public sector in other forms).

RH September 20, 2010 11:18 AM

Documents already come up for review at X years (I forget then number). A few orgs (like the CIA) had a tendancy to Delasify on {some code here}. They kept changing the rules so the CIA couldn’t use those codes, so they make up new codes to do the same ‘ol thing. No desire to have declassification.

Of course, there is NO way to automatically declassify documents. The rules for what makes a document classified are not always clear, and computers are very bad at unclear directions. There’s also a few binary satisfiability problems to deal with.

It might be able to pick low hangnig fruit. For example, the position of a WWII sub was definitely classified information, but doesn’t say much about our modern sub placement, so a table of that data would be easy for a computer to declassify.

paul September 20, 2010 7:40 PM

RH: There’s an awful lot of low-hanging fruit. Think about just how much paper (or data) military projects generate and how much is nonthreatening once a project is over. And some of that information could be useful to have out in the public domain — for example, staffing levels, vacation time and sick days taken on classified projects could be great for case studies.

Anоnymous September 23, 2010 6:34 AM

Basically, this is a terrible idea. It’s extremely difficult to decide if declassification of a document might give an enemy an advantage in excess of whatever social advantage accrues from declassifying it. An extremely difficult problem for even an expert human, and the agencies resist it because it’s so hard to get right. Essentially, this is another expression of the AI problem: but one where the application is tested on a corpus so vast that it is infeasible to check its results, and the consequences of error are potentially worse that life and death.

So why are they doing this? Because the US government has mandated that they declassify tons (literally) of documents at a break neck pace, and has not provided funding for the millions of man-hours of work required to do it. In order to comply with both government policy and practical possibility, they need to automate the process. Even though it is an idiotic thing to do. I think this is known as the law of unintended consequences.

@paul:

There’s an awful lot of low-hanging fruit. Think about just how much paper (or data) military projects generate and how much is nonthreatening once a project is over.

Having worked on such a project, I don’t think there’s nearly as much as you think. For example, in munitions development, everything related to developmental methods and technologies, testing, qualification for service, operations, maintenance, training etc. remains potentially damaging throughout the service life of the equipment, and frequently long after. Hence it is not “low hanging fruit.”

Pretty well everything else is pulped as soon as the customer accepts delivery, because it is too costly to store, to costly to declassify, and basically worthless to the company.

And some of that information could be useful to have out in the public domain — for example, staffing levels, vacation time and sick days taken on classified projects could be great for case studies.

That stuff usually isn’t classified to begin with; in aggregate it’s usually reported in Congressional budget papers. At the personally identifiable (PII) level it’s CUI (“controlled unclassified information”) that is unlikely to be released within the lifetime of the person whose privacy is affected by the data. In fact the modern tendency is that when it’s no longer required for tax audit purposes, it is destroyed. (It costs a lot of money to keep data confidential for decades, especially when there is absolutely no benefit to you to retain it.)

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.