Vesselin Bontchev June 13, 2016 8:15 AM

To see what can be done with RTF files if they are opened with Word (instead of with WordPad), take a look at this proof-of-concept:

Open it with Word. Preferably with macros (“content”) enabled – RTF files can’t contain macros, right? Right? Oh, wait…

Also, monitor Word’s connections to see it contacting my site when it opens the document – this can be used to implement the beacon functionality Nicholas Weaver is talking about.

TRX June 13, 2016 8:34 AM

So I googled “microsoft word tracking beacon” and the first hit was an article from August 2000.

Only Hindu gods have enough hands for adequate facepalm.

wiredog June 13, 2016 8:36 AM

FBI was doing the same thing 4 years ago. Didn’t do much with Word there, but we used Outlook macros to handle putting classification markings on emails.

ramriot June 13, 2016 8:49 AM

@wiredog: “but we used Outlook macros to handle putting classification markings on emails.”

Hope that was a joke because the only true security marking for an email (unless encrypted, metadata anonymised and/or circulated internally) would be TU (Totally Uncontrolled).

Thoth June 13, 2016 9:25 AM

I wouldn’t be surprise they have ways to prevent leaks due to Word macros or executables with DLP solutions and highly advanced microkernels from General Dynamics and so on.

Clive Robinson June 13, 2016 9:54 AM

You’ve got to love ‘s turn of phrase,

    It’s a crude analogy, but enabling Word Macros in a corporate environment is akin to infecting a brothel with herpes—in short, it’s a bad, bad idea.

Now my old grey cells are not what they once were, but I seem to remember the first Word Macro virus was a “proof of concept” by a Microsoft employee on an MSDN update CD back in the 1990’s for Word 4 running on Win 3.1 or NT4.

Any way, the same file problem used to exist with Object Linking and Embeding, if you copied the file on the network it used to leave any included files in their original locations, but if you saved the Word file to a floppy disc (remember those?) it used to include the files on the floppy.

Now just to repeate a little piece of advice I have been giving to people for so long now it’s beard is longer than mine,

    PAPER, Paper, NEVER data

If some clutz of a legal PITA demands electronic documents, do yourself a favour, print the dam things out and scan them in on an entirely seperate machine as One PDF a page, or if you know how as image files built into an HTML file that you can tweek to turn it into a word file of massive size but little or no use to a smart arse looking for embedded data that is difficult to redact any other way.

It’s not just macros that are a menace it’s all that proprietary “hidden trash” that goes with it.

You are usually safer internaly with human readable file formats, of which RTF is one (that Micro$haft developed). But as others have noted, RTF files can contain nasties. The thing is you can write a passer that will strip out most of the RTF tags with little or no harm.

There are various *nix tools and script you can find associated with CUPS that you can use as a basis for striping files.

As they say “Have a ball knock yourself out” but at the end of the day “STICK WITH PAPER” most of the time you can see what you are giving away, which you can not with proprietory file formats.

CallMeLateForSupper June 13, 2016 10:06 AM

“Can’t they develop a better way of exchanging intelligence information than emailing Word documents around? With macros enabled?”

Snort! Pull back the curtain on omniscient NSA and find…. everyone sipping the Window$ Kool-Aid and “working like we work at home”. How very embarrasing for NSA. If I were a cartoonist I’d have so much fun with this.

In the early 90’s, Big Blue was spending pallets of cash on development and marketing of OS/2, its answer to Window$. At the same time, internal users of 3278/3279 work stations (dumb terminal, connected to a remote mainframe) were migrating to PC-on-LAN, as a cost-cutting measure[1]. And what OS did those PCs run? OS/2? While some did, most ran 16-bit Window$, because it had gained a foothold (users) in the 80’s. I thought it very curious that IBM wanted (expected?) businesses to adopt OS/2, yet it promoted Window$ in its own house. I picked many brains on this apparent contradiction and not one of them offered a compelling explanation.

[1] It was a very good measure. The rental fee for a single 3279 was on the order of US$ hundreds (per month, if I’m not mistaken). It was often said that IBM was its own best customer. Workstations are just one of many examples.

Clive Robinson June 13, 2016 10:08 AM

For those looking to do RTF2TXT or RTF2HTML and vice versa you might want to have a look at,

I can’t make any claims about it as I’ve neither used it or looked at the code, but someone I know has used it for a year or so without as far as I’m aware any issues.

blake June 13, 2016 11:38 AM

@Clive Robinson

If only there were something between paper and closed proprietary data formats …

(replace “evil” with “negligent”, which are of course indistinguishable when sufficiently advanced)

Jesse Thompson June 13, 2016 3:57 PM

@Clive Robinson

Print it off and then scan it again?

You do realize that you are describing a Fax Machine, right?

Besides, I cannot copy or paste text out of a fax and you’ll know that OCR has stopped sucking the very moment that all OCR-based Captcha goes away, but not a millisecond sooner.

At the end of the day what you are telling us is “I distrust all computers enough that I do not want them to understand my documents”.

Maybe this is fine for lawyer’s offices where the documents are used for no purpose other than human reading or just proving that the document exists and that a pen scratched over the “signature” line somewhere.

But even when they use Fax, some assistant is re-typing the case numbers and personal info that they read to make new documents, so that Buttle gets arrested while Tuttle stays at large.

Luddite June 13, 2016 4:39 PM

Just build yourself a cardboard computer to keep those nosy snoops out of your digital business. Of course, you’ll still need a TEMPEST tent if you really want to join the Top Secret Tinfoil Hat Society. Homing pigeon and decoder ring sold separately.

Clive Robinson June 13, 2016 6:07 PM

@ Jesse Thompson,

Print it off and then scan it again?

Yes it sounds mad, but the important thing is you don’t send anything you can not see, to a potential enemy. Of which there are three you have to think about, journalists/researchers who get the files from a crackers exfiltration of what ever they can get, then those who work on the legal side for prosecutors and plaintiffs / claiments, who get judges to issue warrants and orders.

The lawyers and the judges they use have a new(ish) game which is called “electronic discovery”. Normally going through a million pieces of paper, demands resources that “beggers them” so prosecution or plaintiff / claiment lawyers used to keep discovery requests down to what they could handle within the constraints of time and manpower.

Now however with all those juicy electronic records they pull them over sight unseen and get some geek to build a “mini-google” and then go hunting. Importantly they never realy sit down and read all those files just the ones they think have usefull information that they search out. Thus it is their searches that are the key to thier success or failure, and those searches depend on the index they have to use.

But you need to realise that they have two places they can hunt with electronic files. Firstly the actual data which you see in printed form. Secondly and worth many times more the normally invisable meta data. And it’s that meta data that will sink you faster than the Titanic if they get the opportunity to use it.

So your game plan is always to deny your attackers anything at your expense, or that which will be at your expense if they can turn it into a rod for your back.

So fine if they get a judge to sign off access to your “electronic records” give them “images not words” and “disorder not order”. There is as far as I’m aware no law to make you keep anything other than financial records in specific forms.

Howerver there is a myth that has been put around by electronic data systems suppliers that there is a lot of “hidden value” in company records, if you use their systems to get at them all… The reality for most documents, is they have little or no real value unless something goes wrong, then they become either “public” from crackers exfiltration or evidence if a judge inks the paperwork.

The thing about evidence is it’s usually time sensitive as there is a clock running when someone starts legal action. Thus like crypto you want electronic records to work easily for you but not for attackers. Thus a well indexed set of apparently random ordered files in file cabinates works for you as the key to using them is the index. However those random ordered files do not work for someone else if they have to take tonnes of copies build files and then build an index to them. That is if they don’t have time to build the files and index your attackers searching will normally be minimal, and swamping defendents with such stuff at the last possible minute is a technique prosecutors still like to use in the armoury of “Rights Stripping”.

So your mission likewise is to deny an attacker access to the index and anything better than paper. An electronic image of a page as a single graphic is arguably of less use than when printed, because you can not see it and if you do it right nor can the OCR software either[1][2].

I won’t go into exact methods but if the OCR does not work, then it’s back to “periwinkle time” and lots and lots of human resources burning the candle not just at both ends but the middle as well, as they start the journy down the rabbit hole that can lead to “The Stressvile Funny Farm Sanctuary and Care Home”.

If the opposing lawyers get the court to give them more time, then that is often in your favour as well.

As with all such games it’s best to be ahead of the game so you do not get wrong footed like Microsoft and other organisations. After all think how much less damaging for SPE it would have been if those supposed attackers that stole and released all those company emails had not had them in easily searchable form…

[1] For OCR software to work well it needs nice crisp clear letters preferably in certain types of font that have the image equivalent of a large “Hamming Distance” such that errors can be detected easily and correctly corrected. Font’s that are soft curves and have almost random angles whilst still being human readable are much much harder for OCR software. But it’s not just the individual letters. If you remember back to “digital watermarking” in the late 1990s it hid information in “image noise” via the equivalent of Spread Spectrum techniques. For around two years a battleground ensued with regular skirmishes where various techniques used to beat other techniques each holding tempory ground. Then Ross J. Andersons group at the UK Cambridge labs came up with a distortion method in three dimensons that humans did not realy notice but DRM systems totaly barfed on. It won the battle and Digital Watermarking became a bit of a back water. Well OCR does not like documents distorted in that way any more than the watermarking schemes did.

[2] One of the complaints you hear from people is why the documents in the Ed Snowden trove are trickling out so slowly. Well part of the answer is that even though they were said to be well indexed from one asspect, you still have to read them in great detail and cross refrence to get all of the pertinent doccuments for a given story line lined up in the right way.

Hmmmm June 13, 2016 9:35 PM

@Clive Robinson

Yes it sounds mad, but the important thing is you don’t send anything you can not see, to a potential enemy.

Normally going through a million pieces of paper, demands resources that “beggers them”

Now however with all those juicy electronic records they pull them over sight unseen and get some geek to build a “mini-google” and then go hunting.

So, the same suggestion still stands? Never ever submit any metadata to a third-party. Doing so will result in an increased risk that said information could be (ab)used to more quickly identify those paper documents that are potentially more interesting.

After all think how much less damaging for SPE it would have been if those supposed attackers that stole and released all those company emails had not had them in easily searchable form…

I’m having a hard time imagining that… Perhaps wikileaks uses MongoDB or some other unreliable database, but I’ve personally yet to see a properly indexed and easily searchable dump of those files 😕

Vesselin Bontchev June 14, 2016 2:42 AM

@TRX, the trick demonstrated by my PoC has been known since 1997, LOL.

@wiredog, I think you are missing the point. Unlike Word/Excel/PowerPoint/Access/Visio macros, Outlook macros aren’t dangerous, because they are not saved in the “documents” the application creates (i.e., e-mails). So, while you can use them to automate various tasks in Outlook, you can’t really infect another Outlook installation with them. So, having macros enabled in Outlook is fine. Worst that can happen is a document for some other Office application could install a malicious macro in Outlook that sends copies of the potentially sensitive e-mails to unauthorized recipients.

But having macros fully enabled in Word, Excel, PowerPoint, Access or Visio in a sensitive environment is pure madness, because an attacker could do just about anything by tricking you to open a document (containing macros) for the corresponding application.

Clive Robinson June 14, 2016 4:33 AM

@ Hmmmm,

I’m having a hard time imagining that…

What the difference between,

1, searching using a computer with simple scripts through a sequential ordered collection of textfiles with useful metadata attached.

2, manually looking through thousands of randomly ordered image files because OCR does not work on them nor do they have usable metadata attached?

Dan June 14, 2016 7:40 AM

As a developer who occasionally has to make Excel macros for use across my company, I can say there are more secure ways to do this. Office allows whitelisting macro signing certificates. All macros except those signed with trusted certs are disabled. Then they’re just as safe as any other signed software (same failure modes too).

The main problem with Office macros is that they’re usually not developed by software engineers, so bad practice (like disabling all security) is the norm.

Dirk Praet June 14, 2016 8:56 AM

@ Dan

The main problem with Office macros is that they’re usually not developed by software engineers

The main problem is in the technology itself, i.e. that macros are not “sandboxed” in any way. Although you can mitigate the risk by only enabling digitally signed macros, this will still not protect you from an adversary with sufficient resources to use false certificates.

blake June 14, 2016 9:33 AM


That’s kind of the point of Office, especially Excel, Access, and VBA, because if you are a software engineer you start by choosing better tools. Tools that let you isolate data from processing from presentation, and that let you version & unit test each layer separately.

The main problem with Excel is that it’s used by people who aren’t software engineers. Someone will kludge together 4 different data sources, copypasta the same formula into 30,000 cells, connect data using vlookup, and handle deduplication & normalisation with pivot tables without even knowing what 3rd normal form is or even that deduplication and normalisation are different things. Businesses are run like this. If an Excel user knows about certificate signing, that’s already an edge case.

I’ve got a lot of work from fixing this stuff for people, so I’m kind of grateful – in the same way a doctor is almost “grateful” of gangrene – while still being glumly aware of the Broken Window fallacy.

Dan3264 June 14, 2016 8:59 PM

In this usage, it is the “Broken Windows Fallacy”. Sorry for the pun. I just had to make it.

ianf June 14, 2016 9:21 PM

Dan3264, you could always compound the impact by apologizing for apologizing for posting twice what was a by-the-way one-liner tongue in cheek comment, complete with a promise on your mother’s grave not to do that ever again.

Dan3264 June 15, 2016 9:27 AM

As I look at this comment thread right now I can only see the first comment I made. I know I made the comments, and the comment you made(which I can see) confirms that I made the comments, but I can’t see the comments I made. I reloaded the page a few times and it didn’t fix the problem. And yes, I will rarely make puns(I would like to say I won’t ever make any (at least on this blog), but I might forget. I try not to make promises and can only remember one time when I made one).

AllCoolIDsWereTaken June 15, 2016 9:47 AM

@ Clive

I think that generating a raster [or even vector] image of the text on an editor like Gimp would have the same effect as scanning the doc. Am I right?

Old Guy June 15, 2016 1:55 PM

A low resolution scan with a document skewed a bit would make automatic text recovery more difficult. Well rendered text (as Gimp would produce) would be more easily converted back to text.

AllCoolIDsWereTaken June 16, 2016 9:08 AM

@Old Guy

Yeah. I was thinking about treating the image to add some noise, cursive fonts, distortion, removing exif, etc.

But thanks for replying.

STB June 18, 2016 11:34 PM

@Jesse Thompson (June 13, 2016 3:57 PM)

Thanks for your comment! The very same thought crossed my mind as well. The move towards a computerless office, instead of a paperless one, seems just ridiculous to me.

I think some commenters here are missing the point. Not computer systems or IT at large was the problem, but previous usage of inappropriate document formats and associated technology.

Leave a comment


Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via

Sidebar photo of Bruce Schneier by Joe MacInnis.