Phishing Studies

Two studies. The first one looks at social phishing:

Test subjects received an e-mail with headers spoofed so that it appeared to originate from a member of the subject’s social network. The message body was comprised of the phrase “hey, check this out!” along with a link to a site ostensibly at Indiana University. The link, however, would direct browsers to, where they were asked to enter their Indiana username and password. Control subjects were sent the same message originating from a fictitious individual at the university.

The results were striking: apparently, if the friends of a typical college student are jumping off a cliff, the student would too. Even though the spoofed link directed browsers to an unfamiliar .com address, having it sent by a familiar name sent the success rate up from 16 percent in controls to over 70 percent in the experimental group. The response was quick, with the majority of successful phishes coming within the first 12 hours. Victims were also persistent; all responses received a busy server message, but many individuals continued to visit and supply credentials for hours (one individual made 80 attempts).

Females were about 10 percent more likely to be victims in the study, but male students were suckers for their female friends, being 15 percent more likely to respond to phishes from women than men. Education majors had the smallest disparity between experimental and control members, but that’s in part because those majors fell for the control phish half the time. Science majors had the largest disparity—there were no control victims, but the phish had an 80 percent success rate in the experimental group.

Okay, so no surprise there. But this is interesting research into how who we trust can be exploited. If the phisher knows a little bit about you, he can more effectively target your friends.

And we all know that some men are suckers for what women tell them.

Another study looked at the practice of using the last four digits of a credit-card number as an authenticator. Seems that people also trust those who know the first four digits of their credit-card number:

Jakobsson also found a problem related to the practice of credit card companies identifying users by the last four digits of their account numbers, which are random. From his research, it turns out people are willing to respond to fraudulent e-mails if the attacker correctly identifies the first four digits of their account numbers, even though the first four are not random and are based on who issued thecard.

“People think [the phrase] ‘starting with’ is just as good as ‘ending with,’ which of course is remarkable insight,” he said.

Another attack comes to mind. You can write a phishing e-mail that simply guesses the last four digits of someone’s credit-card number. You’ll only be right one in ten thousand times, but if you send enough e-mails that might be enough.

EDITED TO ADD (8/14): Math typo fixed.

Posted on August 14, 2007 at 11:45 AM37 Comments


DM August 14, 2007 11:53 AM

Nit: ‘You’ll only be right one in a thousand times’. Should be one in ten thousand times. The argument remains valid.

JustMatt August 14, 2007 11:53 AM

Re: guessing the last 4 digits. I don’t think the math is correct on that one, unless the same person is sent 1000 messages. Seems to me that 1000 recipients and 1000 guesses would result in a much lower rate of success.

Perhaps a math person can clarify this.

  • Matt

John K. August 14, 2007 12:24 PM

Most (if not all) checks have the checking account number and the bank’s routing number printed on them. But it seems like quite a few people don’t know the difference. Routing numbers are probably as easy to obtain as the first four of a credit card, if not easier.

I wonder how many people would trust a fradulent email that knew their routing number, or that claimed their routing number was actually their unique account number?

Math Clarifier August 14, 2007 12:25 PM

4 digits, with 10 possible choices for each digit.

10^4 possible combinations.

For each person you send mail to, you have a 1:10^4 chance of guessing the random four digit string correctly.

Peter Stone August 14, 2007 12:29 PM

Re: Matt

Well, providing for the fact that it’s 10,000, not 1000…

Each message sent would have a 1:10000 chance of being correct, assuming perfectly random distribution. When you send it to more than one person, however, you have the added complication that they might share the last four.

If you send it to 10000 people, each of whom has a unique last four, you’ll get exactly one match. But since you’re not guaranteed uniqueness, it’s an application of the birthday problem – how likely is it that any two victims share the last four? I don’t have the data to crunch the numbers on this one, but in a large enough group you’ll still have a decent chance.

Of course, if you send DIFFERENT guesses to each mark, that changes things….

Nicholas Weaver August 14, 2007 12:30 PM

Don’t guess randomly. You use the 4 digit prefixes which correspond to the bank site you are phishing!

anon August 14, 2007 12:34 PM

The first four digits of a credit card number identify the card network (e.g. 45xx for Visa) and the issuing bank. Because not all combinations are used, the odds of getting the right numbers for a phish are far better than 1 in 10000.

Darth Paradox August 14, 2007 12:42 PM

How many unique numbers there are among the population of people you’re spamming has no bearing on your chance of success if you choose the number randomly for each email sent out – easily enough done as part of a spamming script. The chance that a four-digit number between 0000 and 9999 matches the recipient’s last four digits is 1 in 10000 (unless there are numbers that are actually never assigned by the company, or assigned less frequently). But two of my targets having the same last-four has nothing to do with the probability, because I’m not comparing the numbers to each other; I’m comparing each to a randomly-selected number.

If you send out the same last-four to everyone, then you’ll still have a 1-in-10000 probability per email, over the long term. But that method is more sensitive to the distribution of numbers in a small sample.

MyCat August 14, 2007 12:46 PM

I suspect that sending 10000 phishes to a single person, all identical except for the last four digits of the card number, would suggest to the mark that it’s a scam!

DBH August 14, 2007 12:51 PM

Its not a birthday problem because one number is ‘known’ and you are trying to match it. BP is any match in a population of numbers from a fixed set.

However, there are subtleties to this: for instance, the algorithm used to generate valid charge numbers might generate them in an order. Since all number are not yet generated, a better chance of last four match would happen earlier in the sequence. Although this sequence might be different for each ‘first four’.

There is the finite chance there will be no matches or more than one match for a four digit pick, the odds of at least one match out of 10000 emails given a uniform distribution of issued ‘last four’ is greater than the probability of one and only one match (which is 1:10000)

derf August 14, 2007 12:54 PM

Point being – no matter how secure you make the PC, operating system, and network, your users still have to have access to the data. These users can and will misuse that data.

dragonfrog August 14, 2007 1:01 PM

Now, we can at least get a reasonable indication of whether phishers are following the academic research on phishing.

If we start seeing phish mails supposedly from a particular bank’s credit card branch, which references the appropriate first four digits of the card, and if we hadn’t seen such tactics before, that’s a good sign someone is doing their homework…

Rob August 14, 2007 1:11 PM


you send one email each to 10,000 people. only one in 10,000 responds, but if you sent out enough emails it could be profitable.

honeypot August 14, 2007 1:16 PM

There is so much phishing going on that without gMail being so good at sorting this out and placing this type of email into the spam folder it would be so easy to actually fool some people some of the time.

Each spam email I receive is sent directly to spamcops and phishing email reported to castlecops.

Tesla August 14, 2007 1:25 PM

Given the number of credit cards the average American keeps, the odds are probably more like 6 in 10,000.

Nyhm August 14, 2007 1:37 PM

Unfortunately, there are many legitimate online services that forward to third-party systems for certain functions. Sometimes the URL will be of the form:, which most folks think is OK.

The worst case I’ve encountered was a corporate human resources management system, which appeared to “farm out” part of the management functions to another site. Moreover, the last four digits of my SSN was part of my USERNAME!

I had to complain several times to get my username changed, but no one seemed to understand my concern about the URL.

Amused August 14, 2007 2:07 PM

I’m amused that so many people are focusing on the math and not the obvious: if you give many people 4 digits and tell them it’s the last 4 of the credit card, a large percentage will believe it. They’re simply to lazy to look it up, perhaps because they have several cards to check, and the path of least resistance is simply to believe that if you represent that you know the last 4, you must be legit.

It’s like that Harvard/MIT study where most Internet banking users paid know attention when the space for the authentication image said “server down.”

All that matters is that the representation is believable by the target. To be believable, it must fit the target’s needs for belief.

Brandioch Conner August 14, 2007 2:28 PM

I’ve said it before and I’ll say it again, once the bad guys discover “databases” our economy will be in serious trouble.

Start with Social Security Numbers. They should be unique.

Then match as much information as you can to them. Name, sex, age, address, phone numbers, any account information.

Then match those items to other items on other people. As in the article, if the phishing message APPEARS to come from someone you know, you are likely to follow the link.

And all it takes is one exploit on your computer and they can sort through all your financial data and email.

They then use that information to tune their phishing for your friends and such.

Eventually the bad guys will have more information about you than you do.

N017734 August 14, 2007 2:30 PM

I think you’d get more than 1 response per 10^4 numbers.

If a target received an email with 3 out of 4 numbers correct, I bet they’d think it was just a typo or a “computer error”. Especially if the incorrect digit was first or last in the 4-digit sequence.

Michael McDougall August 14, 2007 2:41 PM

For those who want to know the real odds:

If you pick 4 digits at random, 10,000 times, you aren’t guaranteed to get 1 hit (assuming you email someone different each time). There’s a 37% chance of getting 0 hits, a 37% chance of getting 1 hits, an 18% chance of getting 2 hits, a 6% chance of getting 3 hits, and a 1.5% chance of getting 4 hits. There’s a .37% change of getting 5 or more hits.

If you send out 10,000 of these emails every day, you will average 1 hit per day, but the exact number of hits per day will follow the odds listed above.

(I’m assuming the last 4 digits of credit card numbers are set randomly.)


Trichinosis USA August 14, 2007 2:59 PM

“And we all know that some men are suckers for what women tell them.”

So, how many people directly involved in the phishing study are women?

Anonymous August 14, 2007 3:17 PM

hmm.. off topic but does somebody already have a rainbow list of ‘valid’ CC numbers online? basically run the full list of potiential numbers throught the CC validation function (that’s valid numbers, possible to use, not active numbers).
That would be useful for test/audit data.

DarkFlib August 14, 2007 3:40 PM

At least for credit cards, the last few digits include a checksum of the rest of the number, so if the initial prefix is known then the number space is probably a lot less than 10,000 combinations.

Even if this only cuts it down by 1 digit, thats from 10,000 down to 1,000, giving a possible 10x increase in response rate. Certainly worth doing if you are playing the odds.

Sean August 14, 2007 6:24 PM

The last digit of a credit card issued in the US is a checksum digit using the Luhn algorithm.

The last four digits are often not random, especially for large companies. I cannot tell you the number of times I’ve seen consecutive cards in a sort of sequential order. i.e. xx10 is one while the next is xx28 followed by xx36, xx42, and onwards.

Ian August 14, 2007 9:33 PM

One in ten thousand times? It’s much, much easier to do it by just picking a common prefix. I actually first noticed that the first 4 digits were the same when I went through a few Citizens Bank debit cards when I was 18 due to repeatedly losing them. They all started with, I think, 5449. All Mastercard too, natch.

So all someone needs to do is pick a common credit card number, which is especially easy if you target a specific area and have access to a bank card from that area.

I remember a while back I got a phishing email purporting to be Bank of America, which is my current email. It struck me as odd, and I had to examine it for a while before I determined that it was indeed fake. I still followed the link just out of curiosity (literally never gotten a phishing email), but didn’t input my credentials. It was pretty funny seeing all the typos in the page, which looked very much like a BoA page.

RonK August 14, 2007 11:28 PM

@ Ian

I got a phishing email purporting to be Bank of America, which is my current email. ….
It was pretty funny seeing all the typos in the page

Didn’t you mean “my current bank“? You thought you were investigating a phish, while in actuality you’ve now caught the “infective typo meme”. Everyone redding your posts now will start two make …

Never mind.

Kanly August 15, 2007 12:37 AM

Those warnings so often used: “Do not click on this link if you do not trust the (site/sender/etc)” always irk me for this reason. A quick way for vendors to dump a problem onto the user they know they have no chance of making an informed decision on.

Daedala August 15, 2007 9:02 AM

It’s very easy to find out the starting sequences for various banks. That phishers have used this, and that people fall for it, is pretty old news (at least 2004…).

Simon August 15, 2007 9:03 AM

There’s an old, pre-computer story involving a horse-racing scam.

Assume (for the purposes of my re-telling this story) that a given horse race has ten horses in it.

So the scammer sends to a thousand people a letter saying he has a system for knowing the race winner, and to prove it, he’ll tell them the winner of a particular upcoming race.

One hundred of the letters name the first horse, one hundred name the second horse, etc.

After the race, he writes again to the hundred people who got the winning horse, and makes the same offer. Ten letters name the first horse, ten the second, etc.

Then he writes to the ten people whose horse won the second race, and says, “If you want any more, you’ll have to pay.”

They’ve just gotten the winners on two races. They might be inclined to believe him.

Paul August 15, 2007 10:23 AM

Well, I find it extraordinarily interesting.Good luck to

all of you. And I’m sure you’ll do fine. Really. Just fine.

bob August 15, 2007 10:26 AM

Considering that people will cheerfully electrocute a stranger in the next room as long as they are told to do so by someone standing next to them wearing a labcoat (google or wiki “Milgram experiment”), this is non-news.

Aidan A. O'Brien August 15, 2007 1:17 PM

Why guess the last 4 digits of someone’s credit card number when you can guess the 4 digit PIN of their ATM card. This might possibly provide a more convincing reason to connect to a (bogus) web site, in order to change the PIN.

Maybe there could be some research on any correlation between the “quality” of the data “revealed” and the likehood of successful phishing. Or would that be helping the bad guys too much.

Andy August 15, 2007 2:56 PM

The math of the last four is worse than 10^4 if you also have to guess the bank (issuer). With the first four you already know that.

Snickers August 19, 2007 10:44 AM

OK, If we agree with Amused’s comments and Bob’s comments, as I do, should we not discourage banks from using the last four digits (or anything similar) as a means of authenticating their legitimate emails? For example, Citibank now includes a “Security Zone” with last 4 digits at the top of its emails, and First National Bank of Omaha includes a “For Your Security” section on its emails. I find these features laughable, as they can so easily be spoofed by the means discussed above, or by anyone viewing a credit card receipt, etc., assuming people bother to double-check the numbers at all. The inclusion of these features has the effect of training users to uncritically rely on something that is unreliable – that is, it makes us less secure and not more secure. Why don’t the banks sign their emails with x509 certificates? If you get legitimate emails from your banks that contain features like the “Security Zone,” send them a message to educate them on why this practice is worse than useless.

Leave a comment


Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via

Sidebar photo of Bruce Schneier by Joe MacInnis.