Schneier on Security
A blog covering security and security technology.
« On the Futility of Fighting Online Pirates |
| GAO Report on International Passenger Prescreening »
May 22, 2007
Good article on image spam:
A year ago, fewer than five out of 100 e-mails were image spam, according to Doug Bowers of Symantec. Today, up to 40 percent are. Meanwhile, image spam is the reason spam traffic overall doubled in 2006, according to antispam company Borderware. It is expected to keep rising.
The conceit behind image spam is graceful in its simplicity: Computers can't see.
Definitely look at the interactive graphics page.
Posted on May 22, 2007 at 6:46 AM
• 52 Comments
To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter.
I've been auto-erasing any non-whitelist email with an image inside it for many years. It's the most trivial to implement, and must be the most accurate predictor of spam.
Computers can't see but image files are just lots of numbers. Presumably the same image has the same numbers (or hash). Why can't that be blocked?
"The conceit behind image spam is graceful in its simplicity: Computers can't see."
This also works the other way around. Man In The Middle (MITM) attacks agaisnt users of online bank systems via phishing etc, tend to rely on automated systems (often on other peoples machines).
If the Banks used a large number of human only readable images in two way authentication of transactions (via a hand held token) most phishing would vanish over night (assuming the weak link human user used the system properly ;)
"Computers can't see but image files are just lots of numbers. Presumably the same image has the same numbers (or hash). Why can't that be blocked?"
Because it's trivially simple to change the image. Change one bit and you have a completely new hash.
"Presumably the same image has the same numbers (or hash). Why can't that be blocked?"
Well, I for one have seen images with "snow" spread out in it. They take the regular image and sprinkle small low contrast random pixels through it for each recipient, and that makes each one be unique. They probably won't be the same exact size either, since it's usually JPG, and even if it wasn't there's ways to inject junk data that changes the size without affecting the image.
It's an arms race, for sure.
@2nd Anonymous: The linked article deals with nothing else than the answer to your question.
All these measures may keep the mail legible. But humans recognize the mail as irregular as well. It's as if someone trying to sell a used car with badly cut hair and a weirdly colored and stained suit. Strange that people fall for it still.
There is no "arms race". The algorithm is simple:
if(!whitelist && contains_any_image)
Statistical filters are the answer. We run DSpam on a 600 mailbox site and image spam is not a problem. More than 95% of spam gets caught. The filter just learns from the whole message (including html, headers and so forth). They are different.
As a sample, here is the "Spammines" factors of the last Image Spam I have received;
This means that, for example, 99% of mails containing the string "face=Lucida+Console" *sent to me* are Spam. Getting "tokens" from all the messages and keeping statistical track of them is the answer.
Read "Ending Spam" by Jonathan A. Zdziarski (the guy that programmed "DSpam") if you want a good book on the matter. Or visit http://dspam.nuclearelephant.com/ to read the info on the website.
I see very little image spam. But then I block the major ISP's home user segments.
"if(!whitelist && contains_any_image)
I do basically that except I reject the message during the smtp transaction so a legitimate sender should get a bounce message and know I haven't received it. Also works great.
A quick note on my article. Individuals filtering out the spam does very little to solve this problem and may exacerbate it. Why? Image spam messages are large. At least twice as big and often much larger than a typical email. That means so long as someone has a way to distribute massive amounts of image spam, it will clog pipes, tax filters and servers and cost money to deal with, somewhere, whether at the end user or the ISP level. Video spam could increase the congestion exponentially. How would this exacerbate the problem? Well if the spammer maintains the ability to distribute and is frustrated by increasing number of filters or a particularly good filter, he can simply attempt to overwhelm the pipes. This appears to have happened after the SEC suspended trading on those penny stocks.
Techniques from digital photography might be an answer in the medium term. These build on the distribution of luminance and color gradients in "normal" images. Photographs have a very specific distributions of gradients. Same goes for comics and scanned text. Images that deviate from such a distribution might be using some obfuscation technique (such as the ones described in the linked article). I conjecture that deviation from a "normal" gradient distribution is a good predictor for spamminess.
This is of course just another step in the inevitable arms race, but better stay ahead before too many Spammers read the SIGGRAPH proceedings. ;)
One of my financial institutions (Vanguard IIRC) already uses the image-for-authentication trick.
You select an image from a set presented to you, once, to setup. Their first login page asks for your username only. Then, the image you selected appears, and it asks for your password.
It would make life harder for phishers, but not eliminate them, as I assume the phisher could get your username, hit the legit server, grab the picture, and display it to you.
I wonder if, with appropriate copyright on the images, banks could use RIAA-like tactics to go after the phishers who illegally reproduce the images, though.
I can't see why any users aren't aware of phishing by now, though (and I've even worked in tech support). Surely everyone who has email has gotten a good number of phishes for accounts they don't have (e.g. I don't have Paypal or Bank of America accounts, so those phishes are instantly recognizable). Surely that itself would train users to suspect such emails.
There almost needs to be some sort of "spam opt-in list," or something. All of those people who are more than happy to buy randomly-advertised penny stocks, performance-enhancing drugs from sketchy overseas outfits or obviously counterfeit handbags should simply put their names on a list, and revel in the Random Acts of Shady Marketing that will be delivered to their inboxes by the terabyte.
(Personally, I'd sign up for the 419 list in a heartbeat. :) It's rare to have such high-quality entertainment delievered for free.)
But, in all seriousness, here's a question - To what degree does ISP-level spam filtering feed into the problem? (Not, mind you, that I'm advocating doing away with it, as it does make life easier.) Would spammers put so much effort into defeating filters if the people who put them there did so because they didn't want the mail as individuals? It seems that they must be getting responses from (l)users on the other sides of ISP filters, otherwise, they wouldn't bother to send the solicitations.
Or am I not understanding the economics of spamming? I understand that many marketers consider no-call lists and personal filters to be a form of external willpower, and getting through them might result in a sale - or are they simply hoping to catch the odd ignorant or mentally deficient, who doesn't understand that most spam, pretty much by definition, isn't on the up-and-up?
For an individual to prevent themselves being bothered by image spam is pretty trivial. You have suggested rejecting or deleting non-whitelist images. I have my own anti-spam approaches which has worked so far (basically, I don't publish long-term email addresses and as a back-up Thunderbird blocks image rendering from unknown senders, so image spam looks like normal spam or an empty message). I doubt that many people reading this blog get much spam.
So our problem is already solved. But that isn't the same thing as "the problem of spam" in general. These solutions don't work in general, because the average user only has one email address, wants to see images in email, and can't be trusted to keep their whitelist up to date. They don't want an image mailed to them by their friend to be silently deleted just because they're using a different address from usual, or aren't in the whitelist yet. Rejecting it (perhaps with a message saying "I don't trust you to send me images") might be acceptable, but it probably wouldn't be popular, and if any link in the mail delivery isn't authenticated, rejected email creates false backscatter, which is another problem in itself.
A scheme which works for one technically-minded person, making their own decisions what to block, won't work in general. A general spam solution has to work for other people, without them calling you up to complain that you're blocking their legitimate email...
"Techniques from digital photography might be an answer in the medium term."
The way I see it, the problem is not that we can't do imaging processing to distinguish spam from legitimate images. The problem is that any such processing must be relatively inexpensive computation-wise. The large amount of image spam that needs to be processed puts a practical limit on the sophistication of image processing that can be done.
@Brandioch Conner: how do you maintain that blacklist of home ISPs? Seems like a big job to me. And how do you recover from false positives?
Saw an article yesterday that mentioned an ability to scan and recognize text in image files, which will eliminate the issue.
I do not allow attachments GIF or JPG PNG etc. to display in my emails.
One-hundred percent of these spam messages flag as spam because of other tell tell signs and gMail sorts them to the spam folder.
Not an issue for me.
Finding the ISP's is easy. My users dump the spam they receive into a spam folder.
I then pull out all the sending servers names and IP addresses.
Then I reverse the order of the name so www.schneier.com could become com.schneier.www and sort them by newly reversed name.
A quick count identifies the worst offenders.
I'm running Exim4 on Linux so I can reject the messages at SMTP time with a custom message containing my phone number should a person ever read it.
I'm seeing about 2,500 distinct addresses being rejected every day. I don't know how many spam messages that would actually be, but it's pretty much solved the spam problem here. With about 100 users, only about 30 spam messages get through a day. And almost all of those 30 go to the same 5 people.
I also reject messages that do not have an rDNS listing.
I get less than 1 false positive a month just with those two rules.
@corey:Surely everyone who has email has gotten a good number of phishes for accounts they don't have.
Unfortunately, the banks themselves seem to be trying _very_ hard to make their actual email look as much like a Phish as possible. If I ever find a bank willing to use only plain text, PGP-signed, for email to me, they have a customer. Ain't gonna happen, because "Web designers" who have never even heard of Tufte or Norman. let alone Schneier, are a lot cheaper than the cluefull sort.
@MikeA "If I ever find a bank willing to use only plain text, PGP-signed, for email to me, they have a customer."
Royal Bank of Canada.
It's not PGP-signed, but it's plain text and never contains links.
I just wish I could use them in the U.S.
It's not just the banks. Here's one that came in today.
As near as I can tell, it is a legit email from ebay.com
Of course, if I were phishing, I'd also use a name like "emailebay.com".
AND this goes back to at least 2003.
Wouldn't you expect eBay to be a little bit more INTELLIGENT instead of making the phisher's life easier?
My legitimate eBay mail (maybe one message per quarter, or one per year, it's hard to tell) generally includes one piece of information that the phishers don't have: my login ID. Similarly messages from the banks tend to include the last four digits of the relevant account number.
I don't have any problem with displaying images that come in the mail, so I do look at them. Judging by my experience, the CSO article Bruce referenced is a good description. I do _not_ download images from links in the spam; generally I consider that unsafe.
(I do get one breed of spam where the layer of speckles covers up the text; either the spammer screwed up the transparency of the layer, or my mail client doesn't render it quite right.)
For a short while I got a small spate of a slightly different image spam, which I assume disappeared because the return was so much smaller than the usual kind. It consisted primarily of a link to an image on a public free image site (like imageshack). The image itself is just the actual spam content; the hyperlink to the image, in the spam message, did not contain any identifying parameters or other "web bug" like additions. Since it required the sucker, oops, spam recipient to click on the link just to see the image, and since the free image site probably deleted all such images upn receiving a complaint, I presume it turned out to be a losing spam technique.
The problem is that those messages are probably sent in clear text.
So it is possible to pickup the "last four digits of the relevant account number" and then send you a message with them in it. And it is very easy for someone to who works at your ISP or the bank's ISP to do so.
And as shown in my eBay example above, you will NOT know whether the sending server is legit or phishing.
"if(!whitelist && contains_any_image)
I do this too, but I only reject 'img' tags.
I allow image attachments as long as they are not inlined via HTML.
Works great. Never seen a spam that wasn't using the 'img' tag, and this filter is trivial to implement.
I think that non-techie users _can_ be trusted to maintain a whitelist, as long as it is baked into their mail agent and is intuitive.
What a mistake HTML-email was! Giving people the power to deliver rich unsolicited content this way seems crazy in hindsight.
ING Direct in Canada does something similar - when signing up, you select one of a predetermined images, but they one one further by also having you type in your own text verification.
Once set up, on the main page you are asked for your account #, and before being prompted for your password you are shown your pre-selected picture *and* you're pre-selected text phrase.
No doubt there are ways around this, but it seems like a reasonable level of protection.
It seems to me like you're never going to find the ultimate level of security. But steps like the one TC mentions above are efforts at forcing people with bad intentions to work a little bit harder. In my opinion, that's all you can really do.
The problem with all of those 'image verification' anti-phishing things is that they only work with the type of people who read this blog.
Phisher sends an email to the average public with some blurb about how their account was compromised, and if they don't see their site key they should phone some number and verify their information to get their account restored: you don't think some would do it? Site key only works if you really understand what it's doing. The 'confirm your site key' text only exists on the bank (BoA in my case) site. Uh huh. That's useful. If the phisher site says "problem with site key- safe to proceed", many people will.
I think the best defense is user education, and that's why I think banks should stop sending links in email, and include some text describing WHY they don't put links in their email.
...I think banks should stop sending links in email, and include some text describing WHY they don't put links in their email.
And they'll start doing that when Phishing costs _them_, rather than _us_.
Hmm, looks like where I came in, ten or more years ago...
There are centrally managed blocklists for ISP users and other spam producers. Most of the spam I get on my (small, personal use only) domain is easily identifiable based on the sending IP's presence in these blocklists.
I have SpamAssassin configured to drop anything from senders listed by Spamhaus, NJABL, SORBS and/or RFC Ignorant lists. The small percentage of spam that escapes the blocklists gets caught by HTML_IMAGE_ONLY_xx rules and Bayesian filtering. It doesn't matter what the content of spam mails looks like when Spamhaus and friends identify their relays in short order.
The long term solution to spam, however, has to be legal. There are only a very small handful of people--a few dozen--who are responsible for 90% of all spam. Put these sacks of s--t behind bars for a very long time and the spam problem will be greatly reduced.
Spam should be legally defined as something akin to high seas piracy. Meaning that spammers can be tried in any jurisdiction and are potentially liable to being summarily executed.
Lest anybody think that I'm one of those who want to "punish those nasty banks", as might have been inferred from my previous comment, I want to clarify. The goal is to _reward_ banks who act as if they have a clue. Currently, security is pretty much pure expense, Those who don't skimp on it (and on customer service, IT competency in general, etc) are at a competetive disadvantage, and are punished by their shareholders, hence ripe for acquisition.
If good behavior showed up on the bottom line, the current owners of my main bank would not be worse than the ones before who were worse than the ones before...
@Rich: "Royal Bank of Canada.
It's not PGP-signed, but it's plain text and never contains links."
Likewise the emails I get from Wells Fargo telling me that I can log in to see my online statement, or that they've paid the billpay transactions I entered yesterday.
No links to be hijacked, no "helpful" clues like "Your username is..." that a dumpster diver could correlate with the scrap of paper with "bank password" written on it, just instructions to log in and click the Statements tab.
> The large amount of image spam that needs to be processed puts a practical limit on the sophistication of image processing that can be done.
I fear that the "practical limit" is too low to be practical at all. Parsing plain text can be done in something around O(n), parsing images is a bit more complicated. The images are encoded in at least two ways: something-to-8-bit (or even 7-bit) e.g. base64 and a (lossy) compression, mostly JPEG or GIF (I haven't seen PNG yet). Both codes are stream ciphers, decoding can be done in small chunks. But nonetheless, you have to run that stream at first through the base64-decoder, than through the image-decompressor and finally through the imageparser itself, which does e.g. a fuzzy hash on small parts of the picture.
Both, the image-decompressor and the fuzzy hash (or luminance/color gradients as dlg suggested, or OCR) cost an enormous amount of computing time, magnitudes more than a simple text-parser. It makes absolutely no economical sense to filter these mails based on the content of the image. There is no need (yet?) to parse the images, the informations in the headers and text (if there is text at all) are sufficient for a good spamfilter.
That does not solve the problem of the increased traffic, of course, but what can one do? Put a spam filter on every router on the net? What about the false positives? Who maintains them? Who will pay the bills?
This paper was presented yesterday at the IEEE Symposium on Privacy and Security. Its main point is that most users (including people who utilize their own accounts, although they performed better than the role-players) ignore most security indicators on banking sites, including https and website-auth images.
I agree with Rich. Who among us needs a filter at all? Can't you just tell from the sender and subject line that it's spam? Don't we have multiple email accounts, with the real ones given out only to a trusted few and never posted or used for registration? I used to use a catch-all account and a different email address for each registration, until I realized that I just don't care who sold my email to whom. Sometimes I open stuff that I know is spam just to see how good they're getting.
But I never, ever buy anything that I saw advertised in spam. I bet none of you do. And those of us who consider it their job to keep other people from doing stupid things - chill. We can't even get people to wear seat belts and quit smoking.
Why don't we have an internet Better Business Bureau? A one-stop shop where consumers can file complaints and businesses can tout their records? Sort of like an eBay seller's rating, but for everything. Then all we have to do is advise our less savvy brethren that, if they think an offer looks attractive, look the merchant up on iBBB, use an iBBB link to go to the merchant's site instead of any email-provided link, and don't buy from anyone who doesn't have a good track record.
Spam works because enough sheep trust it and get fleeced. Legitimate businesses aren't interested because of the stigma, and it's non-targeted. Spammers make their money from people who want morons for customers.
I wish there was a way to tell businesses what I'm interested in, and have them send their best deals to me. I'd be much more likely to buy, and I would open those emails with interest. **Especially** if it can be used to finance the content providers who are fighting to keep non-purchased or non-advertising-supported out of the hands of pirates, but now I'm ... drifting ... off to ... fantasy land!
@Brandioch: "So it is possible to pickup the "last four digits of the relevant account number" and then send you a message with them in it. And it is very easy for someone to who works at your ISP or the bank's ISP to do so."
In my case it doesn't matter; I never click on links to banks (or eBay) in email, rather, I use my existing bookmarks or type in the URL. (I know there's supposed to be some sort of DNS spoofing that can break this; but there's nothing I can do about that.)
@altira: "But I never, ever buy anything that I saw advertised in spam. I bet none of you do. And those of us who consider it their job to keep other people from doing stupid things - chill. We can't even get people to wear seat belts and quit smoking."
I actually read, or at least glance over, a lot of the spam I get. I have not yet taken a spam or phish for a real deal. But you're right, until the spammers manage to empty the ocean, they'll keep going.
To your point on this audience being one of the few to have an awareness of these issues, the company I work for (appx 300 people in an ASP for the financial industry) runs all new staff through a security awareness session, with emphasis on safety tips and threats such as phishing and indicators to watch for. Relevent items in the news are used to illustrate.
I'd like to hope that most ^H^H^H^H some other small and medium business are doing something similar. Gets the word spread around a bit, but it's a tough sell!
Image analysis is expensive, but not prohibitively so. As you said, it's basically O(n) in message length, so it's more or less the same as for text analysis. I see the complexity argument for webmail services, but for end users (ie Thunderbird plugin), it would be feasible already, if a text-based filter is used to pre-sort. How much (image) spam slips through your text-based filter? For me it's around 0.5%. I'm happy to spend a second of idle processor time per email on the rest.
Well, analysis is certainly doable, but as yet it is too expensive for server-side filters. I saw this:
presented at the VirusBulletin con last year so I have some idea of the methods involved.
Well, analysis is certainly doable, but as yet it is too expensive for server-side filters. I saw this:
presented at the VirusBulletin con last year so I have some idea of the methods involved.
We use the Bayesian feature of Spam Assassin. I looked through the spam bucket this morning and all of the image spams were flagged as BAYES_99, meaning 99-100% probability of being spam (they all had scores of 1.0). Despite attempts at obfuscation, the Bayesian approach caught them all.
Not that I'm advocating it, but I'm a little surprised there's been no mention so far of the "E-mail Tax" that used to be proposed as a solution to the spam problem.
If you ignore the significant technological and political problems with imposing such a thing, you can see that, if imposed by the byte, it does at least scale well with image spam and audio/video spam, if we ever go there.
I guess increased bandwidth costs, in this decade of YouTube and internet radio, are not a deterring factor at all for simple e-mail images.
The Bayesian spam filters work very well on image spam. They also are good at detecting email viruses. For MS Outlook try SpamBayes. It's the best spam filter I've found. It nails about 400 spams a day on my wife's system and is 100% accurate for weeks at a time. It's free on Sourceforge.net.
> Image analysis is expensive, but not prohibitively so. As you said, it's basically O(n) in message length, so it's more or less the same as for text analysis.
I said it the other way around, it's O(n) for decoding alone, you have to grab at least some of the bytes a second time which makes it at least O(n^m) with m>1. Some of the algorithms used for image processing are also O(n^m) with m>1.
It doesn't seem very much, but it all adds up.
But it isn't the main problem here, most of the algorithms for text-spam work for image-spam too. No, the main problem is, that the average text-spam mail has about 5 kibibytes but the average image-spam mail 30 kibibytes (with an embeded/attached image; HTML-mail can result in much more if not handled correctly) which makes it six times more expensive to transport. For us, not for the spammer or the sender!
We don't need new ways to keep the spam out of the inbox, we need ways to keep the spam off of "the tubes" in the first place!
I doubt it can be done in a technologically way, but I like positive surprises.
It's funny to see that the same techniques that are used to prevent robot-posting of comments at blogs and major news sites are also used by the bad guys to send us tons of image spam.
There's a posting at /. (http://it.slashdot.org/it/07/05/24/2142206.shtml) regarding the new RFC http://www.ietf.org/rfc/rfc4871.txt
It uses a keyed SHA256 as the default hash-algorithm and SHA1 as the alternative.
A good idea, but I think it's too little to late.
I'm not sure I understand Christopher's problem here. Sure, it is better to fix the compromised PCs in the first place so that they never generate the traffic. But unless the spammers are just being actively malicious to the network, they are not going to generate larger messages (and thus slow down their own throughput) unless it gives them a better success rate.
And, as several others have already pointed out, it does not make sense that these techniques will give them a better success rate. Far from it.
In fact, while it has been correctly observed that all of these techniques make it harder for modern compter algorithms to parse the images, there is geneerally no need to do so because mere use of the technique guarantees that the message is spam, and most of the techniques can be easily detected without parsing the image. For example no one but spammers, absolutely no one, is sending layered GIF images. No doubt some marketroid will eventually try to but if we put 100% blocks in place now, they will never even start.
> And, as several others have already pointed out, it does not make sense that these techniques will give them a better success rate.
It doesn't matter for the spammers.:
They just tried something new. Trying something new has a cost involved. Included in the bill is the cost of changing the programs running on the botnets. So, as long as the new program doesn't decrease their income it won't be changed because changing would cost them something. Whatever this "something" is.
The average spammer's margin is thinner than a razorblade, he avoids any expenditures. Any! The best way to avoid inevitable cost is to let others pay the bills, thus the many and large botnets. Therefore any attempt to put some cost on spam (e.g., pay a fee for every email sent, be it real money or computing time) won't hit the spammer because these costs are externalities, the botnet-member pays, not the spammer. It might work if you can change the way email and anything else that is able to deliver spam is handled globally. This is none of the problems that can be solved technological, not even locally.
Another item on the bill of inevitable costs is the cost of transportation. The botnet-members, the senders, pay only one half of it, the recipients pay the other half. There are a lot of senders, even more recipients and a gigantic pile of bits.
The spam-messages were once simple text-messages, let's set the average size to 5 kibibytes (1 byte = 8 bit). A picture might be worth 10.000 words (only the 10.000 words needed to describe the picture, but I digress) but is also of corresponding size, so if we set the average size of such illustrated spam to a very conservative 30 kibibyte it isn't very surprising that image-spam is one of the main reasons that the sheer amount of bytes of spam has doubled in one year. All of these bytes have to get moved if they enter the net and every movement costs money even the move to /dev/null. A lot of money for us all, but not for the spammer, he is the only one with the really free lunch.
And the next stage will be spam with movies, which I want to name filmpjes-spam (filmpjes is netherlands for the plural of the diminutive of film meaning a short movie. English pronounciation of the word comes quite close to the original) to avoid ambiguities (movie-spam is something very different). These spams would have to be measured in mibibytes and they would add up to a physical problem: they could saturate the net.
And even with a temporarily saturated and therefore unusable net the spammer might make his margin.
So, the only way to get rid of most of the spam would be to prevent it from entering the net. That means to cut off every spammer. These spammers are not the real spammers but members of a botnet, so to cut them off would mean to persuade the individual ISPs to cut their paying clients off of the net. Have fun trying that.
No, the problem with spam is not solvable by any technological means except militarian technology but casting canons with "ultima ratio humanitatum" written on it is not a solution at all.
Can't we hunt these bastards down with dogs & kill them ???
The thing about these security awareness courses is that they don't work very well.
My employer's course tells users to avoid clicking on links in messages. Then the user gets messages about password expiration with a link to click. The course tells us to avoid sending or receiving attachments, but our workflow is based around attaching documents to messages. The course tells us about "secure" passwords and keeping them safe from exposure, but users that must enter these things several times each day have to write them down. These notes are found all around users' work areas.
Schneier.com is a personal website. Opinions expressed are not necessarily those of BT.