Schneier on Security
A blog covering security and security technology.
« Comic: Movie Hacking vs. Real Hacking |
| How Changing Technology Affects Security »
March 6, 2012
The Keywords the DHS Is Using to Analyze Your Social Media Posts
According to this document, received by EPIC under the Freedom of Information Act, the U.S. Department of Homeland Security is combing through the gazillions of social media postings looking for terrorists. A partial list of keywords is included in the document (pages 20–23), and is reprinted in this blog post.
EDITED TO ADD (3/13): It's hard to tell what the DHS is doing with this program. EPIC says that they're monitoring "dissent," and the document talks about monitoring news stories.
Posted on March 6, 2012 at 1:22 PM
• 54 Comments
To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter.
The signal to noise ratio must be epic.
I suspect they are looking more at combinations than individual words. Sort of a Chinese menu idea: One from column A, one from column B, etc.
This list has been googleable on .gov servers since at least 2010, in a PDF not marked as confidential or secret. They may have added a few words but it's definitely the same list.
The fact that it has things like "2600" and "Cain and Abel" tells you how good they are at this whole cutting-edge cyberwar thing. Not to mention all the words which generate absolutely enormous amounts of false positives.
When I first saw the list, I checked my Twitter, and I often manage to hit it several times a day - and so do many totally ordinary people who aren't even in infosec :)
So...under the Weather Emergency related terms, the document says that they're looking for the occurrence of "lightening". Isn't it worrisome that DHS can't spell?
Cancelled? but only if you spell it wrong.
Gee, I feel safer already...
"The flowers will be delivered Thursday"
Gee, according to that list, Bruce, your original posting garners hits for:
Department of Homeland Security
Someone needs to create a web site that allows one to enter a message and get a hit-count result. Of course, we don't know how much each word or phrase really scores---till someone breaks down the door.
Credible Sources for Corroboration: CNN, FOX, ABC, NBC, CBS, MSNBC.
They have got the joking, right?
And we wonder when they say they analysts don't have the time to get to 90+% of incoming data...
What if a "clever" terrorist (who also happens to use social media to communicate his plans) were to re-arrange the letters of these key words? Nuclear-Unclear? Leak-Kale? Sarin-Rains? Swine-Wines? This is just too much fun!
I wonder what percentage of their hits are from role-players using social media as a platform?
@ Peter G,
And we wonder when they say they analysts don' have the time to get to 90+% of incoming data.
That would be the case no matter how many analysts or the size of the list.
It's "premptive CYA" thus if anything gets through they have already established why it can not possibly be there fault.
Importantly from this it follows that the list has to cover every dodgy word already connected with any type of activity that could be construed as being terrorist or on the edge of "new terrorism" etc. So expect cracker words etc to be there as Cyber-terrorists are the next big watch...
Emacs's M-X spook command needs to update the part of the help string that says "whether or not this is true".
M-x spook adds a line of randomly chosen keywords to an outgoing mail message. The keywords are chosen from a list of words that suggest you are discussing something subversive.
The idea behind this feature is the suspicion that the NSA1 and other intelligence agencies snoop on all electronic mail messages that contain keywords suggesting they might find them interesting. (The agencies say that they don't, but that's what they would say.) The idea is that if lots of people add suspicious words to their messages, the agencies will get so busy with spurious input that they will have to give up reading it all. Whether or not this is true, it at least amuses some people.
So, are they just scanning for those words in English language? Or Arabic, Hebrew, North-korean, German as well?
I bet it's just English!
@curious - you think the enemy might not speak English? That's damn unsporting!
ps. Didn't emacs-mail reader used to have an Echelon mode where it would put all these words (or their80s equivalent) at the bottom of each mail message?
Ah, the 80s nostalgia - when we were foolishly scared of a vast soviet military superpower with 20,000 nuclear weapons and millions of troops. Instead of facing the daily terror of toothpaste tubes exceeding 2oz
heiku using only words on the list!
i need a better spelling checker
Subway Attack Plot
Human to Human Pork Gas
The people we really have to worry about aren't going to post about it on Facebook.
What if a "clever" terrorist (who also happens to use social media to communicate his plans) were to re-arrange the letters of these key words?
Simple anagraming whilst fun is not quite upto the mark ;-)
How about palandromic anagraming such as "bombmob", "gunsnug" or as with my name "evilClive". Or where you use an anagram as the first and last word in a sentance such as "orchestra hidden in the carthorse" would get some "g(r)eek scholar" thinking of trojans...
The more "intellectual" the games appear the more the idiots are going to assume hidden messages...
That was the funniest post from you I've seen in a long time. Thanks! =)
What about 'pig latin'? letsa usae DDOSae the Martaae with ricinae and nuclearea bombsae
hmm only english? what about brittish or canadianese (=ehhae letsae usae DDOSae theeehae Martaae ehae with ricinae and nuclearea bombsae, ehae?)
is there a spook command plugin for browsers or an F app?
Wow. Might have to start using Emacs' twittering-mode just so I can M-x spook...
BTW (OffTopic) - major arrests of Anonymous/LulzSec reported in CNNMoney and TheRegister
Oh such jokers. Note that NSA isn't on their keyworded list of agencies. That's because... "there's no such agency". Oh, you *did* hear that one? In 1974...?
Far more amusing, though, is their list of First Tier "Credible Sources for Corroboration" of news. There are the usual mainstream US networks (starting with CNN and FOX, natch) and papers, and they graciously grant the BBC, AFP, and the like tier 1 status. But the only Canadian tier 1 source is a little right-wing wacko web site called Canada Free Press, which contains a handy "Countdown until Obama leaves Office" ticker, and whose motto is "Espousing Conservative viewpoints, cornerstone of which contain love of God, love of family, love of country". Not like the tier 2 "Obviously partisan or agenda-driven sites" like Amnesty International.
Thus does the DHS ensure reliable, unbiased "no further corroboration needed" news quality.
I guess when you hear them talk about "chatter", it's a gauge of how many of these terms they are hearing each day. I wonder that the units are for chatter, GB or billions-of-false-positives?
To all those folks that question having misspelled words on the list, I can't recall the last tweet from my kids (the only people I follow) where every word was spelled correctly (or used correctly). Those might be the smartest things on the list. If statistically more people are tweeting about lightening, then it's probably stormy someplace. Too bad physical location is unknowable for tweets, I guess we'll have to keep those Doppler radars.
It's amazing how self-referentially dysfunctional some of these key phrases are. The terminology chosen in many cases is fedspeak that no one outside of the US government, media or informed outside observers would ever use. There's hardly any point in searching for popular use of some of these terms because the only conceivable popular use would be in repetition of official reports that the DHS should already be aware of through media monitoring or inter-agency communications.
For example, I can't imagine any circumstances where scanning social media for references to the US tsunami warning center (hi, feds) could give the DHS any clearer a picture of tsunami activity than asking their fellow government agency to pass on warnings directly.
funnier than the list of monitored terms are the instructions in §7 for connecting to their text-messaging system... including how they should select "accept for all sessions" when the certificate error is displayed...
Good to see DHS is not monitoring NSA. They didn't make the organizational name scanning list... or did they ;-)
Couldn't this press release be used to measure whether a given person's use of these words doesn't change? Obviously I'm not a statistician.
To paraphrase Lisa Simpson, "I know all those words but that government agency makes no sense."
There comes a point when you go from targeting suspects to everyone is suspect. They surely crossed this line. Many times.
I wonder how long until someone makes a stenography app with that vocabulary in the dictionary, so people can post to each other in facebook and overload the watchers?
Am I missing something here?
As far as I can tell from the document, it doesn't claim that the keywords are being used to identify communications from or to adversaries. It purports to describe what looks like just a glorified (and rather extensive) internal news clipping and summarizing service, identifying public news items whose content taken at face value is something the department needs to keep track of.
As many commenters have noticed, these keywords would be useless for identifying internal communication in all but the most mind-numbingly inept terrorist organizations. On the other hand they do makes sense as a coarse first filter for discarding irrelevant news items quickly -- if none of these words appear in a news item, then one can decide with fair confidence that it's not relevant for the internal news aggregation feed.
Since the keywords do appear to be relevant for this openly stated purpose, and would be essentially useless for filtering wiretapped data, why are people assuming that they're being used for the latter?
I'll readily believe that other branches of various US spook agencies are trying to identify and intercept communications between their adversaries. I see no reason at all to assume they would use the same list of trigger words for this as they use to find openly interesting news items in open media (or blogs, twits, whatever, but all intended for public consumption).
Does the presence of so many keywords related to weather and natural disasters indicate a belief that terrorists are able to influence these occurrences, perhaps by means of prayer to Allah?
That makes as much sense as most of the stuff the TSA does. (For that matter, are they watching all the deserved criticism of the agency that prides itself on being hated and feared by the people it claims to serve?)
@george - the weather is the key.
Remember it's the 200th anniversery of the war of 1812 - when a British army swept down from Canada, captured Washington and burned the Whitehouse
The TSA are aware that the British continually talk about the weather and plan to use this to spot a secret army of redcoats infiltrating the US.
hm yes looks like a news clipping type service. I used to wonder though, why information collectors would bother with looking for terrorists using key words, since actual ones would use coded ones instead. My thought is the NSA isn't so much searching for islamic terrorists, but American citizens who discuss the war on terror. They would use the key words- for instance researchers, etc. Maybe the NSA is actually looking for "trouble makers" who ask too many questions, not so much terrorists. Thanks.
I found the general instructions in the document "Analyst's desktop binder" much more interesting than the list of keywords which are being searched in news media etc.
The methods for grading sources and categorising collected material for Items of Interest whilst controlling Personally Identifiable Information (PII) are clearly described and I can think of a few news analysts who would benefit from a reading of this.
I do have some doubts in the wisdom of them classifying Fox News as a tier 1 news source though.
It seems whoever came up with this list isn't very well educated in computer security. In the Cyber Security section they have Mysql injection opposed to SQL injection. (Sarcasm) From their list we can conclude that the only DBMS vulnerable to a bad client application is MySQL. So if you use SYBASE or Oracle you can continue to develop applications that don't check input for SQL injection.
Our society is not enriched by this new monitoring of social media by Homeland security & Law enforcement.
For example: I was going to get some exercise by using a hand drill to hang a picture, smart idea, but the bit got stuck in a pipe causing it to leak quite spectacularly, spraying water on my MP3 player dock, tripping the main power breaker and to add insult to injury the wall collapsing too. After that we got back to our plans to help Nicaraguans cook, import and distribute PCP in exchange for illegal guns to be used for mass genocide.
8 in one semi-plausible sentence and none in the one of actual interest!
I'm actually curious what percentage of the web gets identified with these watch words.
I think we are misinterpreting this a bit. To me it seems like a service to follow events in progress (earthquakes, floodings, revolutions or similar) based on what media and people are saying (think the use of Twitter during the Arab spring) rather than using social media to capture terrorists. This is what is described in the document around procedures for distribution.
Or maybe I am to naive and read the document too literally...
Henning Makholm, Kris Mak, and "A foreigner" have it right. To think that a US intelligence agency would openly publish the list of keywords that they're using as a primary mechanism to try to catch terrorists is naive.
To think that they are also so thick that they are only monitoring English communications, and that a misspelling would throw them off beggars belief.
When the uninitiated discuss intelligence agencies, the opinions veer between "They're so ignorant, out of touch, and clueless they couldn't catch a cold!" to "They're super-evil, echelon-breathing, crypto-cracking, scheming, looking through my bedroom window, law-ignoring, gps-tracking, James Bondish secret agents that are watching all of us all of the time!"
I suspect the truth is firmly in the middle between those two points.
artistic assassins crash.
dirty bomb hurricane.
relief! social media
virus outbreak threat.
standoff: China busts radicals
San Diego airport:
They forgot to add, freedom and liberty
What if a "clever" terrorist (who also happens to use social media to communicate his plans) were to re-arrange the letters of these key words? Nuclear-Unclear? Leak-Kale? Sarin-Rains? Swine-Wines?
Using anagrams is a very poor way to attempt to create a code.
If this is the best a "clever" terrorist can come up with the most difficult problem with any dumber ones is working out if they should stand trial or be legally considered the equivalent of small children.
@ emace: Curious: Could not go to the site as the word 'spook' is blocked by our school district web filter, for its most suspect meaning, as is "church", "breast cancer", crimes by names, male peafowl, etc. Yet the Google tunnel lets text out.
Wow, so, if terrorist use arabic code they're pretty much safe ?
How ridiculously stupid.
I see a lot of comments mocking the idea that every tweet or post with these keywords is examined by hand. However, I think it is much more likely that these terms are used to gain situational awareness through trend analysis. Thus, the bad guys may not be posting their intentions, but as soon as something happens you can see the uptick in people discussing an event with certain terms.
From what I have seen of these publicly available details of DHS monitoring, it looks like they are used by the operations centers to monitor current conditions, not by the intelligence analysis centers.
I don't like that whole spook command idea. It keeps people busier than they really need to be. I know it's popular to think of the staff of the NSA as being animatronic versions of their former human selves, but sadly, that's just something Hollywood ( I'm all about Hollywood lately btw ) refused to change in the sci-fi versions of 'spook scripts', you see?
FWIW, and you probably can tell that's not much, the 'CYA' aspect makes a lot of sense.
Schneier.com is a personal website. Opinions expressed are not necessarily those of Co3 Systems, Inc..