Fake Documents that Alarm if Opened

This sort of thing seems like a decent approach, but it has a lot of practical problems:

In the wake of Wikileaks, the Department of Defense has stepped up its game to stop leaked documents from making their way into the hands of undesirables—be they enemy forces or concerned citizens. A new piece of software has created a way to do this by generating realistic, fake documents that phone home when they’re accessed, serving the dual purpose of providing false intelligence and helping identify the culprit.

Details aside, this kind of thing falls into the general category of data tracking. It doesn’t even have to be fake documents; you could imagine some sort of macro embedded into Word or pdf documents that phones home when the document is opened. (I have no idea if you actually can do it with those formats, but the concept is plausible.) This allows the owner of a document to track when, and possibly by what computer, a document is opened.

But by far the biggest drawback from this tech is the possibility of false positives. If you seed a folder full of documents with a large number of fakes, how often do you think an authorized user will accidentally double click on the wrong file? And what if they act on the false information? Sure, this will prevent hackers from blindly trusting that every document on a server is correct, but we bet it won’t take much to look into the code of a document and spot the fake, either.

I’m less worried about false positives, and more concerned by how easy it is to get around this sort of thing. Detach your computer from the Internet, and the document no longer phones home. A fix is to combine the system with an encryption scheme that requires a remote key. Now the document has to phone home before it can be viewed. Of course, once someone is authorized to view the document, it would be easy to create an unprotected copy—screen captures, if nothing else—to forward along,

While potentially interesting, this sort of technology is not going to prevent large data leaks. But it’s good to see research.

Posted on November 7, 2011 at 6:26 AM52 Comments

Comments

Tom November 7, 2011 6:32 AM

I’d have thought the best way to attack this would be to insert false positives. Anyone who can reverse-engineer a Word macro (for instance) and controls even a small botnet could make the whole thing more or less worthless. If it’s done well, it could waste a lot of investigative time, too.

TRX November 7, 2011 7:29 AM

A fix is to combine the system with an
encryption scheme that requires a remote
key.

The Adobe “E-Book Reader” software and its “.lit” file format support that.

phessler November 7, 2011 7:37 AM

This was a major plot point of Cliff Stoll’s “The Cuckoo Egg”, and caught a KGB-financed hacker operating from Hannover Germany.

Worked in 1980s.

Neeneko November 7, 2011 7:43 AM

I suspect, like most security measures that make like more difficult or increase errors, people will end up having some way to tag or identify the fake documents that they hope outsiders will not.

“Oh, don’t open any documents with the ‘foo’ prefix, they are fake and will send up a flag”. Thus rendering the system useless…

Wladimir November 7, 2011 7:46 AM

Macros? As a lot of document formats are either based on HTML or can support HTML-ish tags, wouldn’t it be as simple as including a remotely sourced image? (possibly transparent and 1×1 like web site trackers use)

Caleb Jones November 7, 2011 7:50 AM

The first thing I would do if I were Wikileaks is to have a policy that all leaked documents/files are only ever opened in a clean environment with zero networking capabilities (a clean VM might be able to do this).

If you’re really paranoid, don’t even transfer over the network and only ship physical read-only medium.

Phone home all you want.

ChristianO November 7, 2011 8:00 AM

Sun Tsu already knew that you need 5 types of Documents to catch Wikileaks…
or was it spies?

Nevermind. Assange was discredited so well during this year that one has to wonder if he is that stupid or secret services are that good?

David Ihnen November 7, 2011 8:12 AM

‘encryption requiring remote key’. there’s nothing saying that even if it decrypts it gives back the right key – put two payloads in the block, each encrypted to a different key. If they’re at all suspicious, kick back the ‘wrong’ key. The result is a fake document out of the payload. Even calling home doesn’t mean you have what you were looking for.

Tierlieb November 7, 2011 8:15 AM

With a good tool to analyse text structures, simple watermarking would allow tracking: Just mutate sentences to form a unique pattern and give a personalized version to everyone.
No need to phone home.

Adam November 7, 2011 8:29 AM

Just twiddle one bit in every file (e.g. remove an apostrophe or replace a comma with a semi) according to a bit test on the id of the person accessing it. After a relatively small number of leaked documents chances are you have enough to identify the guy.

izzy November 7, 2011 8:30 AM

well,

as an evaluation of the concept for data leak tracking we tried a somewhat similar approach with MS webbugs a few years ago (~ 2001) and gave up after two months.

We had so many false positives that after these two months we were all suspected leakers… And that was only with real documents. Every day, one or two guys would accidentally click a document they were not supposed to need but could access in the project directories…

OTOH – if they are really pushing this, the Pentagon will be lobotomized soon, so maybe they’re up to something…

Couldn’t they extend that to the TeaParty hypocrites as a real public service? 😉

izzy November 7, 2011 8:35 AM

@Adam,

no you don’t…

if i’d be a leaker, i’d never give away the originals for exactly this reason – i’d always convert this stuff and in doing that, being a somewhat compulsive “punctuation fixer”, would bring in changes. While i might be a minority in this, the risk is high that in many cases, the wrong people would be indicted…

And now, with this concept public?

Bob November 7, 2011 8:36 AM

Sounds like a rather useless idea that might just add a little more trouble to determined crackers((isolate and crack/decrypt maybe?) . But I do see management / non-techies buying into this bs, so I guess research on this may be funded by them. 🙂

x, y & z November 7, 2011 8:41 AM

“But by far the biggest drawback from this tech is the possibility of false positives. If you seed a folder full of documents with a large number of fakes, how often do you think an authorized user will accidentally double click on the wrong file?”

Possible this would be more useful in creating a document depository version of a honey pot, i.e. some location in which every document has this feature.


“A fix is to combine the system with an encryption scheme that requires a remote key. Now the document has to phone home before it can be viewed.”

Adobe pdf. documents already seem to make this possible. In my University online library we have some .pdf ebooks that can be downloaded and utilized for x amount of days. The way they work is that:
1. you can only download them if you have been authenticated (logged in to the system)
2. once you downloaded a copy, you can only open it when you are connected to the Internet. The doco sends some request to some home server and will not open until it gets some reply.

I agree, though, that this mechanism will not stop a determined hacker/industrial spy/foreign government agent. Although if the request/response is encrypted, it should make it difficult to analyze it for duplication purposes.

Mailman November 7, 2011 8:42 AM

I have read a while ago about “web bugs”, basically 1×1 pixel pictures embedded in the background of a page that are to be retrieved with an HTTP request.

When someone opens the document, you get a hit on your web server for that image.

Bill Blake November 7, 2011 8:51 AM

so why would they just implement a DRM solution and be done with it… oh wait I know… it’s the Federal Government. You can’t expect anything that would make sense to come out of there!

Anon November 7, 2011 9:13 AM

“A fix is to combine the system with an encryption scheme that requires a remote key. Now the document has to phone home before it can be viewed.”

Wasn’t that a plot point in the film “Eraser”?

Tom Lowenthal November 7, 2011 9:21 AM

The measures suggested in the post seem rather over-zealous. The documents themselves don’t need to be modified. The server providing the documents can just identify when someone requests the fake file and raise the alarm. There’s no need for client-side involvement at all. If this is used on a “secure”ish computer system where users must authenticate before accessing files, then any request for an unauthorized file affirmatively identifies the would-be-leaker.

vexorian November 7, 2011 9:23 AM

It seems to assume that only outsiders without information about which are the fake documents will be interested in leaking the data. Not sure if that’s always the case…

PrometheeFeu November 7, 2011 9:31 AM

Why does this sound a lot like DRM? Especially in terms of how easy it will be to bypass it…

David C November 7, 2011 9:38 AM

This sounds like the best argument I’ve heard for giving text documents the ability to execute arbitrary code when opened (as with e.g. Word macros).

I still think the fact that doing so makes text document viruses possible is enough of a down side to outweigh it.

paul November 7, 2011 9:57 AM

Unless done really really well (and possibly even then) doing this with real documents potentially violates some crucial document-classification principles. In general, the creator of a classified document should not have any idea of who is reading it and when, so then you need some central license-server facility. Which then becomes a high-value target for traffic analysis at the least and all manner of subversion at best. You also run the risk denial of service, so that people have to either keep a lot of classified documents open on their desktops or else face the possibility that they won’t be able to read something important in a timely fashion.

And yeah, once the technique is known you will have people monitoring network activity by any believed-sensitive documents and using that information to do all manner of interesting stuff, including some delightful joe jobs.

Adam November 7, 2011 10:00 AM

@izzy, you assume someone will be diligent enough to do that and they may not be. Were the wikileaks documents stripped of all punctuation?

PrometheeFeu November 7, 2011 10:01 AM

It seems to me it would be much easier to rate-limit the download of documents from their secure servers and investigate people who run close to the limit. I mean, you can only read X documents per day. If you are downloading 5X maybe you are preparing a briefing and need to look at them all. But if you download 20X or more, it’s a pretty good bet you are giving those documents to someone else. That would have caught Manning. Not that I would want that, but you know…

Byron November 7, 2011 10:31 AM

Maybe anyone interested in governmental transparency should simply forward unopened “suspected” documents to The Guardian, The New York Times, Der Spiegel, Le Monde, and Al-Jazeera. Let there experts open and/or publish said suspected works.

It will work if these entities stay “Too Big for Jail”.

Clive Robinson November 7, 2011 10:54 AM

@ Bill Blake,

“so why would they just implement a DRM solution and be done with it… oh wait I know… it’s the Federal Government. You can’t expect anything that would make sense to come out of there.”

Congratulations you’re right on both points.

This is just another DRM system with all the issues it carries at the client end – that is it cannot be made secure, just tough to crack.

As some have said with the various “Canary Systems” (I think these were first mentioned publicly in a “Jack Ryan” novel 😉 like all Canary Systems they leak information through a side channel that exists due to some form of redundancy in the protected media.

The trick is for the system designer to have multiple layers of side channels, so that as an attacker finds some they miss the others. This is difficult but not impossible to do with sufficient skill. But they will always be visable to those attackers who know how to analyse the system correctly. Either directly (ie the client software) or indirectly (by “black box” examining the inputs and outputs).

It is this last point “black box analysis” which is the achilles heal of all such DRM systems. And will give rise to the failing of any “Fritz Chip” system including the latest nonsense from Microsoft and the various IP “rights holders” with the likes of “Secure Boot”.

One way to muck up side channels is to add jitter of some form or another. That is with a time based system you put it behind another system in the coms path that’s function is to “clock the inputs and clock the outputs” thus it can strip off time based channels and add spurious information. Any attempt by the designer to protect their system from this will actually aid the attacker by revealing information even if it’s just a single bit indicating “Time based DRM present”…

Terance Healy November 7, 2011 10:58 AM

I think the FBI referred to this type of thing as a ‘beacon’. i needed to know the IP addresses of the people who were behind the intrusion of my computers and network. BUT, the computers and network were under thier surveillance (phones too). So finding a program to do this wasn’t possible.

Instead, I sent an email with a few related pictures to be pulled from my web site and one 1×1 pixel picture that did not exist.. My error log tracked every IP address that tried to get to the non-existent file. Knowing who you are up against can be a good thing and a bad thing. And watching them fall into it, and everyone they forwarded the email to has been revealing. Check out my intrusion nightmare. Now going into it’s 6th year. A Terroristic Divorce http://www.work2bdone.com/live

Nick P November 7, 2011 11:59 AM

I’m with Tierlieb on this one. Some kind of watermarking scheme would be better than a phone home approach. The reason is that this problem is essentially the same as the DRM problem. The only DRM strategies that have taken work to crack involve secret, hardware-enforced crypto and formats. Even those were cracked. All software-only DRM strategies that I know of have failed. This one referenced in the article is among the least effective I’ve heard of.

Nick P November 7, 2011 12:03 PM

@ Clive Robinson

I think your worrying about side channels is premature. This solution & probably whatever they settle on will be insecure on so many levels that nobody will need to bother with esoteric attacks. Let’s remember that they’re largely using Windows PC’s, not XTS-400’s. 😉

xyz November 7, 2011 12:51 PM

Instead of generating docs that ‘call home’, they could just inject a bunch of fake docs on their file servers that no one has no business of accessing. An alarm rings when the doc is accessed on the server. That way, if somebody batch-steals a bunch of docs, they will caught red-handed. False alarms will be a problem.

Clive Robinson November 7, 2011 2:29 PM

@ Nick P,

“I think your worrying about side channels is premature.”

The honest answer is I’m with you, I doubt that they could get such things working in any way shape or form.

However sometimes you have to tell people that they are being bewitched not only into thinking that there is a crock at the end of the rainbow but also convincing them that the only thing there is not gold but the animal by product that is sometimes (politely) called “fertiliser”, that will at some point be distributed when it hits the fan.

According to fable there was once a vain Emporer who was gulled by a couple of shiesters into walking around with his other “crown jewels” on display so the lesson is atleast centuries old…

John David Galt November 7, 2011 3:30 PM

If I were a would-be spammer (or even wanted my own botnet), I would love this tech. I’d just create one such document, on a topic many people will read, and upload it somewhere, then wait for millions of copies of it to “phone home” with e-mail addresses for me to abuse.

Clearly, then, this must be regarded as a form of spyware, and anti-virus software vendors will need to defeat it.

David November 7, 2011 8:56 PM

@x, y & z:

It seems like you’re describing the Adobe Policy Server.

We use it in my ‘day job’ to distribute soft-copy PDFs of various publications to semi-trusted people.

There are options to grand short-term (30 days maximum, I think) offline leases.

However, we had some clown use one of those continuous print screen tools to generate a PDF ‘image’ of every page of the document. He then printed it out, hand annotated it, scanned it and sent us (via FTP) a hundred-odd megabyte PDF!!

We did tell him how to create FDF comment files, but no, that was much too hard!

joe November 7, 2011 11:27 PM

if the documents are secure enough to be worth leaking, they might not even be connected to the internet at all. they’ll probably be on a local network w/ an air gap, in which case, the “disconnecting from the internet” tactic doesn’t work. if they disconnect from the local network, that action will probably raise a bunch of flags by itself. so, in that case, this begins to kind of, sort of, work.

Jonathan November 8, 2011 1:49 AM

I’d like to see a more human-based watermarking system. For example, you could add a benign but interesting sentence to a document, and have it vary by person accessing the document. Example: “An FBI report from date X says Obama is actually a hamster*”, where X varies.

  • Note: Obama is probably not a hamster.

Joe in Australia November 8, 2011 2:32 AM

The actual problem was that Bradley Manning (allegedly) was able to burn hundreds of thousands of files to a thumb drive and / or CD ROM. This was a fundamental collapse of the security apparatus. Surely you don’t expect the people behind this sort of thing to competently carry out a sophisticated plan involving cunning and believable forgeries.

Harry November 8, 2011 5:01 AM

While potentially interesting, this sort of technology
is not going to prevent large data leaks. But it’s good
to see research.

Yes. Very good to see governments hiding what they really are doing from the people… Perhaps then USA can help other militar coups that creates dictatorships in other countries like it did in my country in the past…

pointless_hack November 8, 2011 12:23 PM

I read half the comments, and searched “read receipt without hits.” Is this idea substantially different from a complicated “read receipt?”

Tim November 8, 2011 6:49 PM

R@N: what do you mean by “today’s document formats”? Postscript has been around since 1982 and is definitely Turing complete. However, I’m a big fan of the i/o of such documents being restricted to the screen or page.

Clive Robinson November 8, 2011 7:19 PM

@ Tim,

“Postscript has been ostscript has been around since 1982 and is definitely Turing complete”

You are making me feel all nostalgic 😉

Anybody else remember when your printer came with an embedded Sun Unix box, and thus in many cases the printer had more “smarts” than the average users workstation…

Anyone else “hack it” to get print priority?

grumpy November 9, 2011 1:43 AM

Sounds completely useless. This is exactly the mechanisms we’re trying to prevent the bad guys from using (web bugs, phone-home macros, executable code in docs). If I wanted to defeat something like this, I’d take a normally secured Windows PC (not default secured – my kind of secured) and simply open the document. Which decade was this invented in again…?

Jonas November 9, 2011 4:00 AM

Why are you all talking about the downsides of tracking from within a document with a macro? Having a tweaked filesystem that alerts on read is almost failsafe, isnt it? Having the documents on a webserver makes it even possible for an noob admin to track stuff… but probably they will even use some stupid macros 🙂 the best protection from our oppressors is the stupidity of our oppressors…

thumbelina November 11, 2011 3:43 PM

“…documents that alarm if opened”

Is this true if one used an opensource OS + PDF tools or OOo?

Joe Buck November 16, 2011 4:38 PM

It gets worse. Documents that “phone home” when opened resemble malware, because they are malware. A bad guy might arrange for a document to try to grab confidential info from your computer and send it to some server somewhere, so antivirus programs will detect this. Perhaps the DoD can strong-arm the big antivirus players into not detecting their own malware, but then other hackers might discover this, and find a way for their malware to masquerade as DoD malware.

làm răng sứ uy tín June 1, 2017 5:16 AM

Why are you all talking about the downsides of tracking from within a document with a macro? Having a tweaked filesystem that alerts on read is almost failsafe, isnt it? Having the documents on a webserver makes it even possible for an noob admin to track stuff… but probably they will even use some stupid macros 🙂 the best protection from our oppressors is the stupidity of our oppressors…

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.