The NSA on How to Redact

Interesting paper.

Both the Microsoft Word document format (MS Word) and Adobe Portable Document (PDF) are complex, sophisticated computer data formats. They can contain many kinds of information such as text, graphics, tables, images, meta-data, and more all mixed together. The complexity makes them potential vehicles for exposing information unintentionally, especially when downgrading or sanitizing classified materials. Although the focus is on MS Word, the general guidance applies to other word processors and office tools, such as WordPerfect, PowerPoint, Excel, Star Office, etc.

This document does not address all the issues that can arise when distributing or downgrading original document formats such as MS Word or MS PowerPoint. Using original source formats, such as MS Word, for downgrading can entail exceptional risks; the lengthy and complicated procedures for mitigating such risks are outside the scope of this note.

EDITED TO ADD (2/1): The NSA page for the redaction document, and other “Security Configuration Guides,” is here.

Posted on February 1, 2006 at 1:09 PM25 Comments

Comments

Fred Page February 1, 2006 1:45 PM

I’m amused that the final major step is to convert to .pdf. Apparently, the NSA considers the .doc format non-trivial to redact.

Eric Miller February 1, 2006 1:52 PM

Interesting that they suggest replacing written content with a series of a single letter. I remember reading a paper from Daniel Lopresti and A. Lawrence Spitz where they showed that often times you can recover the original word if you know the size of the redacted word and its context.

Sabre150 February 1, 2006 2:19 PM

It cracks me up that in such a serious document the screenshots of Word have the little kitty cat 😀

@eric: hurm… I was think that. At least it’s not a case of just replacing the letters with a fixed char, but preserving the spaces. I suppose it just depends on how much is being redacted. The larger the block the harder it is to work that out.

For a genuine old skool redacted look you could replace the number of deleted chars (including spaces) with the same number of chars of lorem ipsum then make the background and text black.

J.D. Abolins February 1, 2006 3:11 PM

For what it’s worth, the NSA page for the redaction document and other “Security Configuration Guides” is at http://www.nsa.gov/snac/

(The link Bruce gave for the document is at fas.org. No problem per se. Many people may be more comfortable going to the FAS site instead of NSA’s. But sometimes it is good to know where the author offers a document.)

Alun Jones February 1, 2006 3:13 PM

@Fred: Read the paper – under “Details”, it says exactly why they start with Word (because everyone uses it) and end with PDF (because it’s the de-facto standard for distributing read-only forms of a document).

For redacting Word documents on their own, there’s always the redaction tool that Microsoft post at http://www.microsoft.com/downloads/details.aspx?FamilyID=028c0fd7-67c2-4b51-8e87-65cc9f30f2ed&DisplayLang=en – I haven’t tried it myself, mind you.

John February 1, 2006 4:38 PM

I guess this is easier than erasing/hiding the secret parts, printing out a hardcopy, reviewing it, and scanning the hardcopy back into a new file.

Nick Johnson February 1, 2006 4:44 PM

How long until it’s modified to say “In order to redact a document, one should first , and then . Finally, an should be applied…”

Nick Johnson February 1, 2006 4:45 PM

Gah! Those weren’t real HTML tags, blog software!

I guess my mock redaction was a bit more realistic than I was trying for. 😉

mpg February 1, 2006 6:50 PM

(As demonstrated by page two of the document, adding a tantalizing line saying “This page intentionally left blank” is just bound to keep people guessing at to what you’ve hidden under all that whitespace…)

WC Leung February 1, 2006 9:38 PM

Poor strategy. Why not to print the result to JPG or TIF files? These formats will simply make the “black rentangle” and “white rectangle” trick, which are for hard copy, to work flawlessly. (Of course, care should still be taken for metadata)

Dimitris Andrakakis February 2, 2006 3:07 AM

@Someone:

Because of the ratio of users-that-use-MS-Word / users-that-use-*Tex.

This is not an NSA employee manual. It’s a guide for the rest of us.

chill February 2, 2006 5:23 AM

Use Acrobat to save each page to TIF, convert to CCITT Group 4 Fax format (uses the least amount of space), remove metadata from resulting TIF, collate back into Acrobat and finally scramble metadata and date in PDF using a good text editor.

If you need stgrong anonymity, print to paper, then scan and post at an interent cafe.

Do any of you guys know if Adobe generates any kind of information in the PDF file that links back to your software serial or some kind of hardwar indentifier?

Chill February 2, 2006 5:32 AM

Use Acrobat to save each page to TIF, convert to CCITT Group 4 Fax (uses least space), remove metadata from resulting TIF, collate back into Acrobat and finally scramble metadata and date in PDF using a good text editor.

If you need strong anonymity, print to paper, then scan and post at an internet cafe.

Do any of you guys know if Adobe includes any kind of information in the PDF file that links back to your software serial or some kind of hardware identifyer.

Matt Palmer February 2, 2006 6:29 AM

I am currently examining digital redaction methods for use in government archives. The print to PDF method is widely reccomended.

The reason why is because a PDF printed in this way will not contain any hidden metadata which may exist in the original format. In fact, an existing PDF would also be converted into another PDF too. It is the conversion process itself which (apparently) ensures that non-visible material is excluded.

More extreme methods involve printing to PDF, then OCRing the PDF to make a new document (PDF or otherwise), to ensure complete isolation from anything that is not visible.

Reasons why documents are not simply converted to an image file format are:

(1) Most image file formats can contain hidden metadata (simple bitmaps don’t. JPEGs and TIFFs do). You must either verify or trust that a tool converting a document to a bitmap is not also helpfully preserving hidden metadata (like the author field, for example).
(2) Image file formats take up too much space. Of course, very good formats for digital documents exist (DJVU), but these can also include hidden metadata.
(3) Images of documents are not searchable.

Anonymous February 2, 2006 6:39 AM

Alun Jones:”For redacting Word documents on their own, there’s always the redaction tool that Microsoft post at http://www.microsoft.com/downloads/details.aspx?FamilyID=028c0fd7-67c2-4b51-8e87-65cc9f30f2ed&DisplayLang=en – I haven’t tried it myself, mind you.”

This tool does not provide a complete secure redaction solution in the same sense as the NSA method, although it is useful.

This tool is about the actual act of redacting material itself, not securing the redacted document once done. It can only perform textual redaction (graphics and other objects cannot be redacted by the method). It works by replacing characters with a roughly equivalent length of short characters – the pipe symbol |, which are formatted to appear as black on black.

This is good, because it largely preserves the formatting of the document, but is not guaranteed to be exactly the same length as the original word(s), and the number of characters will be different too. This helps to foil various kinds of attacks on the redacted material, including guessing words by their exact positioning and length.

It can be very useful in doing the actual redaction of the textual content, especially by people who are not technically expert.

However, this “redacted” word document may still contain all sorts of other hidden metadata. To ensure you have got rid of this, the convert-to-PDF, print and OCR, or other method is used to ensure that the document contains only what is currently visible. This is securing the redaction.

Redacted February 2, 2006 8:58 AM

I went looking for easter eggs. There appears to be one item “redacted” from the original document. Perhaps I just missed it, but there’s a non-visible

“CLASSIFICATION//X1”

tag line in there. Yeah, it’s timid, but a little ironic.

Michael Sullivan February 8, 2006 9:22 PM

One reason NSA doesn’t print and scan is that it’s under a statutory mandate to promote accessibility for the handicapped, as are other government agencies. Printing and scanning gives you image files that can’t be accessed by reader software, whereas files converted to PDF directly from Word are accessible.

Printing a visually redacted document and then scanning is easily the most secure way of producing a redacted document, though, and it’s relatively easy. I recently did just that with a 50-page document that needed to be filed with a government agency in both confidential and redacted form. I used a style in Word that I named “confidential”, and in the full confidential version I shaded the style with a bit of gray. In the version used for redaction, this style was changed to white-on-white, then printed and scanned. Fast and easy, but not accessible.

Tony June 6, 2011 1:52 PM

“If you need strong anonymity, print to paper, then scan and post at an internet cafe.”

Make sure that your “scan” process doesn’t preserve the yellow dots that your printer probably put onto the paper to document the serial number of the printer you used.

Ilya June 6, 2011 9:36 PM

So, I take that examples of NSA failures on the subject they try to educate public are not welcome to mention here? Ok than. This does not make it less ironic.

brice June 10, 2011 7:08 PM

Wait, isn’t the problem primarily that you’re using .doc format documents in the first place? I’d be seriously worried at the implications of having a “secret” document created in MS Word. I’m no openness evangelist, but a proprietary format is all kinds of bad news.

  • First you get obsolescence. TeX documents written in 1980 are still readable and useful on a modern operating system. Can you use WordForDOS documents today? On the latest version of MS office? SGML has been an ISO standard since 1986. Aren’t intelligence documents kept for decades?
  • You’re also reliant on third party support if something goes wrong. Oh boy is that annoying. Why should you need to call Microsoft when a problem occurs with your system (typically with older documents/versions) when you could just pick someone already vetted for access and simply hand him the standard for the document format?
  • You’re vulnerable to security problems in someone else’s application. Because the format is poorly specified (or not at all for some older ones), you can’t write and audit your own program, which is yet another weak link when handling sensitive information.

In the end, the only reason these documents are hard to redact is that they were designed to be difficult to modify outside of the original program or scope. If you have problem redacting some document, remember: you made a deal, now you pay for your choice…

I know that I’ll be keeping any information of value as plain text or, for more complex data, standardised file formats, and will be able to redact at my heart’s content, thankyou very much.

the ominous anne June 15, 2011 10:20 PM

Not sure if others mean print to actual paper, but if you have access to Acrobat Standard (writer) 6.0 or greater you can export any document to image files, then convert the images back to PDF and OCR the PDF. Often this gives little loss of fidelity but not all fonts are recognised. This is arguably easier than printing and scanning but I still see people in my office print to paper and scan, because it works.

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.