Metadata in MS Office
Hidden metadata is in the news again. The New York Times reported that an unsigned Microsoft Word document being circulated by the Democratic National Committee was actually written by, wait for it, the Democratic National Committee.
Okay, so that’s not much of a revelation, but it does serve to remind us that there can be all sorts of unintended information hidden in Microsoft Office documents. The particular bits of unintended information that precipitated this news story is the metadata.
Metadata is information on who created the file, what it was originally called, etc. To see your metadata, open a file, go to the “File” menu, and choose “Properties.”
I’ll bet at least some of you will be really surprised by what’s in there. Not because it’s secret, but because it has nothing to do with you or your document. That’s because metadata follows the file, and not its contents.
Here’s what I do when I want to create a MS Word document. Maybe it’s a file I’ve written, and maybe it’s a file I received from someone else. I find some other document that has basically the same style I want, open it up, delete all the contents, and save it under a new filename. MS Word doesn’t change the metadata, so whatever was in the “Title,” “Subject”, “Author,” “Company,” and other fields of the original document remains in my new document. This means that occasionally those metadata fields are filled with information I’ve never seen of before and from who knows where. I’m sure I’m not the only one who uses this trick to avoid dealing with MS Word stylesheets. So metadata is much less a smoking gun than many make it out to be.
I don’t mean this to minimize the problem of hidden data in Microsoft Office documents. It’s not just the metadata, but comments, deleted parts of the document, even parts of other documents (it’s happened).
I have two recommendations regarding Microsoft Office and hidden data. The first is to realize that programs like Word and Excel are designed for authoring documents, not for publishing them. Get into the habit of saving your documents into pdf before distributing them. (Although if you’re going to redact a pdf document, be smart about it or you’ll have similar problems.)
The second is to install Microsoft’s tool for deleting hidden data. (Works for Office 2003; there are third-party tools for older versions.) Or at least read the page about deleting private data in MS Office files. And to follow through on deleting data.
This probably won’t work for many of us, though. The last sentence of the article explains why:
“The real scandal here,” Mr. Max told The Los Angeles Times after Democrats expressed outrage over the White House’s fingerprints on the testimony, “is that after 15 years of using Microsoft Word, I don’t know how to turn off ‘track changes.'”
Milan Ilnyckyj • November 14, 2005 12:53 PM
One thing to be aware of: some of the free PDF converters for Windows will grab the meta-data from Word and add it to the PDF files you are creating.
The unwanted data in the properties of this file (http://www.irsa.ca/NASCAfinal.pdf) is a case in point.