Analysis of Redaction Failures

Redaction failures are so common that I stopped blogging about them years ago. This is the first analysis I have seen of technical redaction failures. And here's the NSA on how to redact.

Posted on June 6, 2011 at 7:06 AM • 18 Comments

Comments

vwmJune 6, 2011 8:40 AM

I guess we will see that kind of failures much less often in near future. The tools are getting better fast: e.g. if you try to hide text beneath a black rectangle with the current version of Acrobat, it will warn you to use its redaction feature instead.

GreenSquirrelJune 6, 2011 8:51 AM

But just think of all the fun our children will miss out on - no longer will they be able to hold documents up to the light to see what "naughty phrase" has been hidden.

Sigh.

Technology, eh?

Captain ObviousJune 6, 2011 9:40 AM

I'm with Simon. It's always fun to select all and paste to notepad to see what's behind the mighty protective rectangle.

Streisand Effect works best when major blogs and forums bring it to light.

yellow ink is niceJune 6, 2011 11:02 AM

@GreenSquirrel

Don't let those children have access to a "high-lighter" pen.

Marking over a patch of "Magic Marker" that was used to block-out the "naughty phrase" will make it somewhat readable by causing the underlying print to reflect light, and be visible again.

Clive RobinsonJune 6, 2011 2:14 PM

Something I advise people when entering into litigation is,

"Paper, Paper, Never data"

Nearly all "data files" such as Wordpro documents contain an enourmous amount of redundant data. Often this includes "un-edit" information and alsorts of other data linking the file to a particular PC etc etc.

Now I don't know what all that redundant data does nor do I suspect more than half the developers who made the software. Now some people earn a very respectable income out of finding out what all the redundant data means and selling their services to those involved with the legal proffession.

So if you give over the data files you don't know what other information you are handing over to someone who is investing mind numbing levels of resources to attack you.

Thus why take the risk. If you must "file electronicaly" print the documents out and use a scanner to make image files and send those instead.

Dirk PraetJune 6, 2011 4:30 PM

Excellent analysis and interesting looking code at https://github.com/citp/pdf-privacy . (Certificate Patrol throws a warning)

I'd say it's just a good example of the traditional failure of using products you're insufficiently familiar with to fully understand their privacy & security ramifications - even if all disclosed -, potentially ending up paying a price for it. Until such a time that you are explicitly required to electronically file vector formats, going with Clive's suggestion remains the safest approach.

Peter A.June 6, 2011 4:55 PM

@Clive:

Good advice. Next step for the more paranoid: even if you can submit paper, don't print. Your printer may betray you. We've all heard of printers adding low-visibility dot patterns to printouts. The pattern is supposed to encode the serial number and time, and is apparrently present only in color laser printouts, but who knows for sure what else it could contain and what other hidden features may still reside in your printer's firmware...

Ok, so better write everything in your own hand...

Hmmm... what about the paper and pen used? :-)

godelJune 6, 2011 7:04 PM

I've heard that one easy way to sanitize WP files of all the assorted cruft and old edits is to "save As" an older file format. For example, save in Microsoft Word 6.0 format.

The file size of your document will instantly plummet.

LegacyWareUserJune 6, 2011 11:31 PM

@godel

Good idea.

Also, for you Windows users, be sure to use a fictitious name/organization when you install your O/S, as those details are also embedded in your MS-Office documents.

LegacyWareUserJune 6, 2011 11:37 PM

@Peter A.

oops, wanted to reply to this too.

I've heard that those "secret" printer IDs are printed with ultraviolet ink.

Now that ultraviolet lamps are becoming inexpensive, you can investigate the markings for yourself.

Now go and buy an ultraviolet ink stamp pad (that nightclubs use on patron's hands), and "doctor up" those markings that your printer is secretly leaving on your documents.

tommyJune 7, 2011 12:16 AM

@ Clive Robinson:

The FOSS tool Docscrubber is said to display and remove such metadata as you wish from MS Word .doc files. It supports only up through Office 2003, but claims that 2007 and the newer .docx format don't contain so much metadata. Care to have a quick look?

http://www.javacoolsoftware.com/docscrubber.html

@ Peter A.: There's also the spy-movie trick of sneaking into the target office and copying the contents of the printer's memory card, since many can store a number of documents in memory, especially if they are also faxes. So you can also see what else they've printed lately ...

@ godel: Or Wordpad. Or Notepad. After all, it's the data that are important, not how pretty the page looks, right?

@ (fellow-) LegacyWareUser: If enough of us registered our OSs as "user", then that would convey no proof of source, even if found in the Word doc.

Clive RobinsonJune 7, 2011 2:50 AM

@ godel,

"I've heard that one easy way to sanitize WP files of all the assorted cruft and old edits is to "save As an older file format. For example, save in Microsoft Word 6.0 format."

The old way to do this was just to "save to floppy disk" as the wordpro would realise it had to dump the cruft. But who still uses floppy disks (except me ;) these days? And as I've not upgraded my wordpro's in a very very long time I don't know if the functionality has been dropped or not.

However it does not get rid of all the data.

Another way is to first store in a human readable format such as RTF and then do a string search for data that should not be there. Then on a copy remove bits you don't like (the RTF format is documented somewhere up on the web as it's a quasi open standard).

Another way is to "print to file" in postscript format and go through it with open source postscript tools (there is atleast one book on postscript programing so you can check for dodgy program constructs).

Then again save as a txt file and then go to another wordpro import via the clip board etc and manually reformat (not realy allowable for legal stuff).

However I am in disagreement with one of the authors about saving in an electronic format that alows cutting and pasting, it does not do you any favours so why do them for your legal opponent.

There is a saying in the armed forces which is relevant,

"Don't leave ammunition for the enemy."

@ tommy,

Thanks I'll go have a look.

tommyJune 7, 2011 8:35 PM

@ Clive:

Looking forward to your opinion of Docscrubber.

"Then again save as a txt file and then go to another wordpro import via the clip board etc and manually reformat (not realy allowable for legal stuff)."

What exactly is wrong with what I said to godel, about just saving it in .txt format and printing that out, *without* adding or changing any formatting, which .txt hardly allows anyway, other than font etc.? What is the other side to do, claim "But Your Honor, there is a lot of hidden information beyond what was actually said, and we can't win without it?"

Clive RobinsonJune 8, 2011 12:01 AM

@ tommy,

"What exactly is wrong with what I said to godel about just saving it in .txt format and printing that out, *without* adding or changing any formatting which .txt hardly allows anyway, other than font etc."

First off sorry yes I did read your earlier comment to godel,

"@ godel: Or Wordpad. Or Notepad. After all, it's the data that are important, not how pretty the page looks, right?"

And should have mentioned what was wrong more clearly.

Firstly though please remember Judges in some respects are not as upto date / switched on as you might think (which can be a godsend in many ways but a nightmare in others).

The average judges work life has revolved, and usualy still does, around documents in the paper form. Thus they are very good at spotting when documents differ not just in words but in fonts, formating, smudges and creases as well.

This is because there is also this notion of "chain of evidence" which boils down to having certified / untampered originals with clear chains of custody etc. Or to put it another way they should be both traceable and authentic to the eye in all respects.

Most judges now understand about "the forensics" of photocopiers and how they can be used to make both good and bad copies of paper documents, and how the copies can be verified and authenticated etc.

So if a judge looks at a document that has been redacted they may ask to see the original and if the formating is not the same they are potentialy going to ask a lot of questions about authenticity, traceability and verification (the answers to which you realy realy don't want to be trotting out in court after also removing metadata).

This is because their world view or assumptions about redacting is you "make a paper copy of the paper document and redact the paper copy" not "make an electronic copy of the electronic document and redact the electronic copy".

This "paper copy" world view is clearly seen in the likes of FOI requests for documents, which has it's downsides as sometimes the hole size gives significant information on what has been redacted.

A bit of illustrative history, I used to know an accountant who had the misfortune of having been unknowingly involved with an international land scam involving fax machines. Put overly simply the crooks had got him to certify a set of accounts then lifted out identifing account and other information and replaced it with false information and faxed the faked copy through to a bank in another country. The bank had incorrectly assumed the faxed fake documents were genuine faxed documents and released funds etc etc. Apparently it was the first case of fraud involving fax machines and his account of what happened in the court about how the judge had the whole of fax technology explained in great detail is an object lesson in why you don't want "M'learned friends" enquiring to closely into technology involving their precious paper documents.

So to answer your question of,

"After all, it's the data that are important, not how pretty the page looks, right?"

They are both as important in a judges eyes.

So on the "looks like a duck and quacks like a duck" theory your ducks "original and redacted" had better look like identical twins or M'learned friend is going to go hunting...

tommyJune 8, 2011 9:17 PM

@ Clive Robinson:

I myself have been the victim of an attempted forgery, in which the Defendant produced a photocopy of a (postal) letter claimed to be written and signed by me, which would have made his case.

It was obvious that the signature had been cut and pasted (literally, with scissors and paste) from an actual original letter of mine, then photocopied onto a different letter, because the signature was almost too perfect in every way, but one small detail -- a long trailing flourish - had been truncated. (Careless crook.)

In any case, the Judge quite properly asked the D to produce the original, and the best he could come up with was, "It was accidentally put through the washing machine...." Case closed; I won; Judge disgusted with Def.

So yes, I am with you all the way on the value of original paper docs. With regard to electronic docs, I like the idea that using web mail leaves a copy (multiple copies, probably) on the servers of an independent third party, so that I could present the printout, and if challenged, d/l the original, in court, from said presumably-disinterested third party (e. g., Yahoo).

Redaction and metadata haven't been issues for me personally yet, but I do like your suggestions to "print and scan" where e-doc is needed.

Have you had a chance to look at Docscrubber yet? ... I tend to send people either .pdfs or .rtfs anyway, as I have found that for some reason, MS Word docs seem vulnerable to minor corruptions in transit. Not security-related, just that the (very trusted) party on the other end reports that the column of figures was misaligned, underlining messed up, etc. (I guess that's why they call the alternative *Portable* Document Format.) Cheers.

Leave a comment

Allowed HTML: <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre>

Photo of Bruce Schneier by Per Ervland.

Schneier on Security is a personal website. Opinions expressed are not necessarily those of IBM Resilient.