The Security of Multi-Word Passphrases

Interesting research on the security of passphrases. From a blog post on the work:

We found about 8,000 phrases using a 20,000 phrase dictionary. Using a very rough estimate for the total number of phrases and some probability calculations, this produced an estimate that passphrase distribution provides only about 20 bits of security against an attacker trying to compromise 1% of available accounts. This is far better than passwords, which are usually under 10 bits by this same metric, but not high enough to make online guessing impractical without proper rate-limiting. Curiously, it’s close to estimates made using Kuo et al.’s published numbers on mnemonic phrases. It also shows that significant numbers of people will blatantly ignore security advice about choosing nonsense phrases and choose things like “Manchester United” or “Harry Potter.”

[…]

This led us to ask, if in the worst case users chose multi-word passphrases with a distribution identical to English speech, how secure would this be? Using the large Google n-gram corpus we can answer this question for phrases of up to 5 words. The results are discouraging: by our metrics, even 5-word phrases would be highly insecure against offline attacks, with fewer than 30 bits of work compromising over half of users. The returns appear to rapidly diminish as more words are required. This has potentially serious implications for applications like PGP private keys, which are often encrypted using a passphrase. Users are clearly more random in “passphrase English” than in actual English, but unless it’s dramatically more random the underlying natural language simply isn’t random enough.

Posted on March 13, 2012 at 6:22 AM81 Comments

Comments

bob March 13, 2012 6:34 AM

I’m never sure what this “20 bits” means.

If you encode your passphrase, are you reducing or increasing your security?

Eg, instead of the common phrase, “Now is the time for all good men to come to the aid of the party.” you use “Nitt4agm2c2taotp.”

Erlend March 13, 2012 6:41 AM

I’m not an expert, but I fail to see the advantage of a passphrase over a random set of characters. Sure it’s longer, but it will also have patterns which can be predicted. For example consonant vowel consonent, and constructs like, on, syn, sym, and so on.

AughtSix March 13, 2012 6:43 AM

20 bits, in this context, refers to log (base 2) of the number of states of the password.

Said another way, there are 2^20 likely* passwords, under a given set of constraints.

And, yes, you could do the obfuscation method, but the point of a passphrase is to be easy(er) to remember, while still giving good security. If you make it gibberish, it’s now hard to remember again. (Which letter did I capitalize? Did I use “2” for “to” everytime, or just once?)

*using “likely” rather than “possible” since the whole point of the research is demonstrating that the actual password usage is much, much less than the maximum possible “any character of the password can be any possible letter/number/symbol” number of passwords.

Ryan March 13, 2012 6:45 AM

@bob,

I think what it’s saying is that word choices (for standard English in particular, but theoretically any language) for pass phrases, while better than pass words, are still poor enough that a great many can be compromised by a quality dictionary attack with an equivalent difficulty to a brute-force attack on a truly random key of the length specified (in this case, 20 to 30 bits).

AughtSix March 13, 2012 6:48 AM

The point of the passphrase is that you can get the theoretical security you’re after (though the research indicates you might not get quite as much as you think) while still being able to remember what on earth your password is. #T$n29Sou9H* (my dvorak-keyboard mashing psuedo-random password generator) may offer a lot of theoretical security (or require a very long passphrase to match) but it would be much easier to remember “O pardon me thou bleeding piece of earth that I am meek and gentle with these butchers”, even if you get a heck of a lot more entropy per character from the gibberish than from the phrase.

Cerebus March 13, 2012 6:59 AM

OK, so users choose crap pass phrases. This surprises anyone exactly how? We already knew people–left to themselves–choose crap passwords.

The problem isn’t passwords or phrases, it’s a weak PRNG involved in the choice. 🙂

I was distressed by this comment: “I’ve been advocating through my research though that authentication schemes can only be evaluated by studying large user-chosens distribution in the wild and not the theoretical space of choices.” This comes immediately after lumping schemes like Diceware into the same bin with natural-language pass phrases. Do the authors not realize that the whole point of Diceware is to make sure the theoretical distribution is identical with the “user-chosen” distribution?

— C

Ryan March 13, 2012 7:04 AM

@0-Six,

I had a friend who once told me (ca. 2005) that his pass phrase for a system was one particular well-known 8-10 word sentence from a similarly well-known work of 19th-century English Lit. The twist was that the pass phrase was actually a variation on that sentence, based on how he initially mis-remembered the sentence when he was attending college in the early 1990s, an aspect that adds a really large amount of complexity, given that he alleges that he has never divulged it.

Ryan March 13, 2012 7:15 AM

Here’s a silly question to ponder: If your pass-phrase is in German, is it technically a pass-word? 😉

It’s a meaningless distinction, perhaps, when employing words like:
Donaudampfschiffahrtselektrizitätenhauptbetriebswerkbauunterbeamtengesellschaft
“association of subordinate officials of the head office management of the Danube steamboat electrical services” (the name of a pre-war club in Vienna)

RobertW March 13, 2012 7:26 AM

Very glad to see research is being done into this area, and is coming up with some hard numbers. All too often I see casual claims such as this one that random passwords are a waste of time. It really isn’t that hard to remember a random password generated for you by a program like Password Safe. It’s just a question of investing a little bit of time in learning it.

Paeniteo March 13, 2012 7:40 AM

@Ryan: “Here’s a silly question to ponder: If your pass-phrase is in German, is it technically a pass-word? ;)”

Not necessarily, no. (NB: I love silly questions 😉

One property of german is that you can create practically infinite words by chaining nouns through a genitive construction, like in your example (leading to cumbersome english translations like “A of the B of C of the D of E…”).
It doesn’t mean that any arbitrary sequence of words (a.k.a. phrase) does form a valid single word in the german language when you leave out the whitespace and punctuation. Just throw in some verbs or adjectives and things get difficult.

For example, the phrase “Frische Brezeln essen Assessoren des Gesangs.” (“Singing assessors eat fresh pretzels.”) cannot be re-written as a single word, without distorting the semantics.
You might go with “GesangsAssessorenFrischBrezelEssen” that would denote the specific event of assessors eating their fresh pretzels.

In case anyone wonders, the phrase is a mnemonic for the “minor scale” in music (hope that’s the right word) and -to get back on topic- as such unsuitable for use as a passphrase. 🙂

r721 March 13, 2012 7:46 AM

But one can insert a word from a rare language/place name/weak 4-symbol password into a pass-phrase, and those estimations would be totally wrong.

Steve K March 13, 2012 7:48 AM

The goal isn’t to reduce risk to zero. It is to make your asset harder to breach than all of your neighbors’ sites.

Moving from 10 bits of strength to 20 by using passphrases means I will still teach and recommend them in security awareness sessions.

The fact that they’re not as good as thought/hoped is of interest to readers here but until a better answer is proposed (while still using knowledge as one of the authentication factors) I have no plans to change.

Ryan March 13, 2012 8:09 AM

@Paeniteo,
I knew that not all words could be compounded in German, but the specific rules were always a bit too far out of my domain of knowledge. You came perilously close to completely killing the joke, but it’s cool information so I suppose you earn a pass. 😉

(I’ve heard that the common claim that Inuit tribes have 100+ words for snow is based on a similar phenomenon, where the Inuit languages compound words much like German does so that the 100+ words are really just 1 word and 100+ adjectives, and people’s lack of knowledge and cultural biases prevent them from realizing that the compound phenomenon isn’t only done to the word for “snow.”)

Brian March 13, 2012 8:21 AM

I only skimmed, but my takeaway was that they felt it’s better for a user to choose a 10bit password than a 20-30 bit pass phrase. They also seem to assume that an attacker knows whether a pass phrase or a password is used. If that information is unknown, then isn’t a 5-word pass phrase ridiculously secure vs a brute force attack?

Also, what online service won’t rate-limit a brute force attack?

Randall Munroe March 13, 2012 8:39 AM

One thing that I think is difficult about this subject is that most of the time, almost by definition, people tend not to talk about their passwords. It’s a place where everyone has come up with their ideas and theories in relative isolation. Few people have ever spent time learning about or working on practical password/passphrase cracking. (Which is probably good—the idea of a world where most people spend a lot of time cracking passwords is a little alarming!)

But it means that any conversation about password/passphrase security inevitably devolves into a tremendous number of uninformed statements about what types of passwords and passphrases are or are not secure, plus anecdotes about how individual commenters construct their secure passwords. Misapplication of information theory can get pretty ubiquitous, and people seem to have some pretty ridiculous and misguided ideas about how an attacker goes about attacking a password or passphrase. There are a ton of misconceptions in the first few comments here already, and unpacking them all would be more work than I’m up for.

Instead, I’ll just share what I think is the most effective approach I’ve ever heard of to password security—a trick a number of sysadmins use:

Rather than yelling at your users to come up with better passwords, you run a password cracker on your users’ passwords (assuming you are in a scenario where you consider this ethical). Whenever it breaks one, it sends them a password change notice along with a note saying their password was nowhere near good enough. Until they’ve learned that lesson the hard way, you’re going to keep hearing things like, “nobody could guess that I took the initial letters of a line from ‘Romeo and Juliet’! Think how many plays there are!” or, “but I included a symbol, which should expand the space of possible characters into the bazillions!”

(I really wish http://howsecureismypassword.net/ weren’t so inaccurate—it’s a great general idea, but it says “secret password” and “abcdefg12345” are both far more secure than “k4dU8x7”, which is laughable. I’d like to try making a better version of that site someday…)

Randall Munroe March 13, 2012 8:45 AM

If that information is unknown, then isn’t a 5-word pass phrase ridiculously secure vs a brute force attack?

No, because you have to assume they’re attacking both in parallel. So the uncertainty adds one bit of entropy.

Really, you have to assume they’re attacking each of hundreds of the most common password formats in parallel. That seems like a lot, but in a local attack of the kind people here seem to be focusing on, it becomes negligible next to the actual password entropy. Terms in exponents swamp everything.

graybeard March 13, 2012 9:23 AM

There are some hints at a basic problem related to “paswword generation”. First of all, most of us seem to be generating passwords with some algorithm (pick first letters of a sentence, pick four words, use the xxx out of the http://www.xxx.com, prefix with pw and postfix with 99, etc). Any advice for good passwords suggests an algorithm (sort of). If I have an idea of your algorithm it better by a secure PRNG. Often it takes one password on one site to get an idea of your algorithm.

Natanael L March 13, 2012 9:28 AM

@bob: That shorter version is LESS secure, because now there’s MORE phrases that will generate the same string! They attacker will kow not have to figure out the precise words, because YOU REMOVED DATA!

Looking random has NOTHING to do with security, it’s ALL about being UNGUESSABLE!

A short random string with 2^12 possibilities is MORE INSECURE than a long strange phrase with 2^20 possibilities!

Dave Sill March 13, 2012 9:30 AM

(I really wish http://howsecureismypassword.net/ weren’t so inaccurate—it’s a great general idea, but it says >”secret password” and “abcdefg12345” are both far more secure than “k4dU8x7”, which is laughable. I’d like to try >making a better version of that site someday…)

Try <http://daleswanson.org/things/password.htm It bats 500 with your examples.

And there’s https://www.grc.com/haystack.htm which has another naive entropy checker, but recommends padding shorter, easier to remember passwords with junk characters to make brute force guessing harder. Of course, if padding were common, password guessers could take advantage of it.

For me, the answer is to use long pseudo-randonly generated passwords with upper/lower/digit/special and store them using something like Password Gorilla or LastPass.

PracticalMatt March 13, 2012 9:52 AM

I think one note on this is that pass phrases are almost unusable for most people’s current mobile use. If you are vigilant and never store passwords, etc, then having to type a long pass phrase is far more bothersome than having to remember 8 or 10 random characters.

Bob T March 13, 2012 9:53 AM

They are talking about a dictionary attack, not a brute force attack.

All the study says is that 8,000 people chose easy pass phrases that can be easily brute forced with a relatively small pass phrase dictionary. I don’t see that as particularly astounding. A relatively small password dictionary will crack a disproportionately large percentage of weak passwords that are out there, as well. The same people who choose “password” for a password, will choose something like “pass phrase” for a pass phrase.

And of course choosing a portion of a well known sentence isn’t going to give a larger return when it’s likely that both, “It was the best of times” and “It was the best of times, it was the worst of times” are both likely to be in the same pass phrase dictionary.

Bob T March 13, 2012 9:55 AM

In fact, I just made the same mistake in mixing the terms brute force and dictionary attacks. 🙂

Fred P March 13, 2012 9:57 AM

The shortest pass-phrase I’ve ever used is well over 5 words long; I’ve never even thought of using that short of a pass phrase.

At least in the rare cases I use them, they are very low entropy for their length; the point is that I consider remembering them far more important than their absolute security. When it’s the other way around (which is usually the case), I use “random” passwords. (Where “random” may be a PRNG or a physical RNG, like dice).

In any case, 20 bits is over 1,000 times as good as 10 bits. That still seems to be an improvement to me.

NobodySpecial March 13, 2012 10:07 AM

It depends what the password is for.

If you are in a regular office network with no outside access then insisting on long complex passwords for your domain users is silly. Their job is to do their work and the sysadmin’s job is to allow that. If you don’t have armed guards by the filing cabinet it’s silly to insist on 30character monthly aged passwords to connect to the printer.

TS March 13, 2012 10:29 AM

@NobodySpecial

“If you are in a regular office network with no outside access then insisting on long complex passwords for your domain users is silly. ”

It’s silly to think your “regular office network” has “no outside access”.

QnJ1Y2U March 13, 2012 10:42 AM

One issue I run into with passphrases: I’m I terrible typist. I actually have better luck typing in FeBr453R$ than horsebatteryjupiterdisorganization (and don’t get me started about trying to type on an iPhone 🙂

Roger Moore March 13, 2012 10:42 AM

Amazing; nobody has mentioned “correct horse battery staple” yet. Especially because that really proves the point. If you want to have good security, your password or pass phrase needs to be genuinely random (i.e. not a word or phrase from a known corpus) to be really hard to brute force.

AughtSix March 13, 2012 10:47 AM

Roger,

It’s referred to in the blog post quoted, Diceware was mentioned, and the guy drew the strip commented in the thread… but other than that, not mentioned. 🙂

Harry Payne March 13, 2012 10:58 AM

@phred14 re “Twain English”:

The authorship of Mark Twain for this is disputed: http://www.ojohaven.com/fun/spelling.html cites an origin in a letter by MJ Shields to the Economist.

Where it almost certainly comes from is Dolton Edwards’ short story, “Meihen in ce klasrum”, which was published in Astounding Magazine in 1946, and which is recycled by lazy journalists in the UK on a regular basis as “New educational madness” or “EU official spelling to kill off English”.

CBarn March 13, 2012 11:50 AM

@roger & @davidr: One thing to consider with brute force attacks (not dictionary attacks) is that the strength of the password or passphrase may be undermined by the implementation of the authentication mechanism.

For instance, many systems store a salted hash of the password. These systems are limited in strength to the number of bits in the hash, no matter how strong the password is – to brute force it, the attacker doesn’t need to guess YOUR password, they just need to guess ANY password that computes the same hash value.

ChoppedBroccoli March 13, 2012 11:56 AM

Here’s something I like to do:

Instead of using pass phrases to compose a long and memorable text password, use pass ‘jibberish. We all have sounds or silly phrases we said (or still say) when we acted silly as kids/adults. Jibberish that still is phonetically pronounceable gives a nice mesh of memorability and strength.

Example:

ficantpellerdootweego

my head breaks this down as ” ficant – peller – doot – weego ”

Now I don’t claim this is as easy to remember as a true pass phrase, but its so MUCH easier to remember than a random generated password and there is no dictionary of words/nouns that will generate these phrases. Now granted there is probably some non random vowel usage and human induced pattern here, but not something I’ve seen a computer mimic yet.

Thoughts anoyone?

Hugh No March 13, 2012 11:56 AM

The security risks here are greatly exaggerated. They are talking about offline attacks which means that the system has already been compromised and the attacker is now trying to crack passwords in order to use them on another system. If you don’t reuse your password/passphrase then you don’t need a very strong one; it just has to be strong enough to prevent online guessing attacks.

Similarly, to attack a person’s PGP private key the attacker needs the person’s private key file; i.e., the system that the key file is on needs to have been compromised already.

Andrew Gronosky March 13, 2012 12:00 PM

@Erlend, the advantage of a passphrase over a random set of characters is that the passphrase is possible to remember. This cartoon says it better than I can:

http://xkcd.com/936/

Two Cents March 13, 2012 12:09 PM

Using five 2Gig dictionaries simultaneously, after 60 million tries over 8 hours, the WPA2 passwd “pinecones” was revealed.

dual-core @ 2.4 GHz

NobodySpecial March 13, 2012 1:23 PM

@TS the office has outgoing access but VPN’s aren’t secured by the regular windows login to their machine

If the attack is external then that is the place you should secure. Otherwise it’s like saying – we don’t have a front door so everyone’s office now has to have a high security lock and CCTV.

ChoppedBroccoli March 13, 2012 1:39 PM

In reply ‘Hugh No’ to:

“””
The security risks here are greatly exaggerated. They are talking about offline attacks which means that the system has already been compromised and the attacker is now trying to crack passwords in order to use them on another system. If you don’t reuse your password/passphrase then you don’t need a very strong one; it just has to be strong enough to prevent online guessing attacks.
“””

I grant you that a physically compromised system means that the victim’s information is almost guaranteed to be cracked, but I do not share the opinion that means that this type of research is futile or exaggerated – it just needs to be put in the right context.

What if the victim’s computer/cell phone/etc has some sort of password database software that stores all their online passwords? Surely, having a password for this database that takes longer to crack is better right? It buys the victim more time to use remote tracking software or inform the police, lock down/change personal bank accounts/passwords…

Essentially by using a complex enough password, the victim is buying themselves more time in the common case to avoid catastrophic identity theft, which I believe is extremely valuable.

Mark March 13, 2012 1:49 PM

More than once I’ve looked for a good reference of “password strategies”. Everyone seems to have a favorite, often bad: (word+numbers, g33ksp34k, mnemonic, scrambled words, passphrase, wo+number+rd, dogsname+kidsbday, cat-on-the-keyboard, etc.)

Nowhere can I find a good list of strategies and their relative worth so that I could give out the real answer: pick the strategy that works for you from a list that doesn’t suck. Different users need different answers. I had two admins just violently protest memorizing 5 word phrases… who weren’t opposed to the nightmare (for me) that is 16-character full random.

Is there a good reference somewhere?

ChoppedBroccoli March 13, 2012 2:19 PM

In reply to ‘Mark’:

“””
More than once I’ve looked for a good reference of “password strategies”. Everyone seems to have a favorite, often bad: (word+numbers, g33ksp34k, mnemonic, scrambled words, passphrase, wo+number+rd, dogsname+kidsbday, cat-on-the-keyboard, etc.)

Nowhere can I find a good list of strategies and their relative worth so that I could give out the real answer: pick the strategy that works for you from a list that doesn’t suck. Different users need different answers. I had two admins just violently protest memorizing 5 word phrases… who weren’t opposed to the nightmare (for me) that is 16-character full random.

Is there a good reference somewhere?
“””

My take on it? NO – there isn’t a good reference and the reason is 2-fold in my opinion:
* text passwords are inherently broken as a single factor login mechanism (they don’t balance memorability and security well at all)
* the type of password strategy an admin employs will be somewhat tied to the environment he/she works in. Are the computers physically locked or mobile devices? Do the computers have encrypted data stores? Are the computers connected to cloud services? How well versed in security are the employees? What is the risk of losing data from a compromised machine (financial, IP, nothing?). All these questions will influence whether the password needs to be complex, change every 6 months, be part of a multi-factor system, be randomly generated (and its ok if employees write it down somehwere), etc.

I am not disregarding your point however. It would really be useful to compare the pro’s and cons of every text password scheme against a matrix of the requirements of the install environment. It would still take someone with a really good understanding of security vs. usability trade-off to digest it and implement it however.

Vles March 13, 2012 2:33 PM

@Ryan
I’ll share one of my old pass sentences (your English Lit friend example)

Quod licet bovi non licet Iovi

The twist being Iovi and bovi should be swapped in this Latin proverb…
I could decide to change the e to ë, wrap it in brackets [text] and place a 0 in front and a 9 behind. These 4 iterations I might use, one for every season in, say 2003. In this case for me to recall the password/sentence is recalling the original + the changes required, like remembering a recipe. If it’s no fun making up a recipe, it’s not worth remembering.

I suppose you can also put in algebraic functions (The equation of a circle is x^2+y^2=1), blend languages (paard battery nietje correct) or use URL’s such as http://xkcd.com/936/ as passwords / sentences, which are probably not included in dictionaries/rainbow tables.

On the grc haystack site I do like the explenation of concept:
Which of the following two passwords is stronger, more secure, and more difficult to crack?

D0g…………………
and
PrXyc.N(n4k77#L!eVdAfp9

In the quod licet Iovi original I’m just as well off to add ten dots before I start the sentence. (in plaintext)

Trouble is, not many people like making up passwords :(. Monday mornings are notoriously shitty for calls to service desk re password resets and I can’t blame anyone for changing their temporary London1 password to hunter2 or fiona03.

As for ethical cracking and mailing results I would like to imagine but have certainly never heard of the following:

CIO: John, it’s 2012. You used a weak password three times in a row now and despite our repeated warnings you continue to use it. You’re the CEO damnit. You’ve signed the ICT policy. I’m sorry but you’re fired.

RH March 13, 2012 3:15 PM

@Rodger Moore: I’m amazed I ran through that many comments before someone brought up XKCD!

Passphrases are preferable to passwords simply because the human brain finds it easier to remember N bits worth of entropy in words than it is to remember N bits worth of entropy in symbols. The brain’s chunking mechamism stores entropy in a complicated way. Of course this is the basis for mneumonics.

I recommend http://www.diceware.com for passwords you care about. People in these comments have been debating the virtues of 10 vs 20 bit entropy passphrases, when diceware is highly effective at generating memorable passphrases with 70 or 80 bits of provable entropy.

Dave Sill March 13, 2012 3:15 PM

On the grc haystack site I do like the explenation of concept:
Which of the following two passwords is stronger, more secure, and more difficult to crack?
D0g…………………
and
PrXyc.N(n4k77#L!eVdAfp9

That’s only true until padding becomes widespread and cracking software starts checking for it.

Seriously, what’s wrong with the Password Gorilla or LastPass approach? I’ve got long pseudo-random passwords that I don’t even try to remember because I keep them all locked up with one key.

Dave Sill March 13, 2012 3:23 PM

I’m amazed I ran through that many comments before someone brought up XKCD!

Maybe that’s because Randall Munroe posted in the comments fairly early on, so mentioning the passphrase strip after that was kind of obvious.

Passphrases are preferable to passwords simply because the human brain finds it easier to remember N bits worth of entropy in words than it is to remember N bits worth of entropy in symbols.

You know what’s good at remembering? Computers. Why remember a bunch of passwords when your phone/laptop/desktop can remember them for you?

LarsW March 13, 2012 3:33 PM

Everything about passwords and passphrases assumes that (US) English is the only possible and usable language.

We Spekers of Foreign Tounges rest somewhat assured.

David March 13, 2012 3:43 PM

egrep ‘^[a-z]{1,10}$’ /usr/share/dict/words | wc -l

51532

lg(50,000^3) ~ 46
lg(50,000^4) ~ 62
lg(50,000^5) ~ 78
lg(50,000^6) ~ 93

So,

egrep ‘^[a-z]{1,10}$’ /usr/share/dict/words | rl -c 5

should give pretty good security, assuming rl uses a decent source of randomness (and it’s surely much better than your head), while still being memorable. Raise or lower the number of words as needed. Can be a problem when a site limits password length (including, ironically, the XKCD forum…)

Alex March 13, 2012 3:55 PM

http://www.adel.nursat.kz/apg/

This decade old program can be used to generate ‘random semi-pronounceable passwords’ (NIST FIPS 181) using easily remembered small hunks of text with occasional numbers and letters interspersed. I have been recommending it for years, and all the passwords I have generated with it I had no problem remembering. It is also has highly configurable random generation mode.

Much easier to remember than the passwords generated by 1Password and other such random generators, while seeming to provide sufficient entropy for John the Ripper dictionary attacks (former security auditor for a fortune 500 tech company.)

Carl 'SAI' Mitchell March 13, 2012 4:00 PM

I, too, like diceware.
Choosing a passphrase manually is just as bad as choosing a password manually. Choosing a passphrase randomly is better that choosing a random password.

For a random password using all 95 ascii printable characters you get log_2(95)= about 6.57 bits of entropy per memorization unit (character).

For a random passphrase using all 7776 words in the diceware list, you get log_2(7776)= about 12.92 bits of entropy per memorization unit (word, character, or common symbol).

Thus, you can memorize less information and have a password of the same strength when using diceware.

Of course, if you just pick words you like from the list the security goes out the window. The whole point of a random passphrase is to get true random word choices.

The biggest problem with passphrases is that many places limit password length.

really March 13, 2012 4:39 PM

@Vles, et al.

All other things equal, the strength of the password does not stand simply on length, complexity, keyspace, or entropy. The most important factor of password management is the storage of said password. It is sort of like encryption, most common attacks against encryption are against the implementation of the encryption rather than against the encryption itself. The reason secret password is stronger is simply based on length vs LanMan/NTLM weaknesses in storage, not against keyspace or entropy.

Hugh No March 13, 2012 4:59 PM

ChoppedBroccoli said: “What if the victim’s computer/cell phone/etc has some sort of password database software that stores all their online passwords? Surely, having a password for this database that takes longer to crack is better right?”

That would seem to be the case but if an attacker is on your computer or cell phone and finds a password file there, all the attacker would need to do is install a keystroke logger or similar program to capture your password when you type it in. This is another case of once they’re in your system you’re hosed.

NobodySpecial March 13, 2012 5:33 PM

@Carl – “can memorize less information”
That depends on wether you find “correct horse battery staple” easier to memorise than “$r~Tp=*0”

Godel March 13, 2012 5:50 PM

No one has mentioned the application in which the password is used and stored.

Passwords are SUPPOSED to be stored as randomly salted, chained hashes, with ideally thousands of passes before you get to the final answer.

In that situation, even “password” might be sufficient. Instead we get passwords and credit cards stored in plain text or at most, in unsalted single-pass md5 hashes.

For example: “Yet another [commercial] porn site was hacked this week, losing 73,000 e-mail addresses, user names, and passwords, and some 40,000 plain-text credit card numbers, including CCV numbers and expiration dates, according to SC Magazine.”

http://arstechnica.com/business/news/2012/03/porn-site-digital-playground-hacked-hackers-say-too-enticing-to-resist.ars

What’s the point of a super-complex password when the hackers can just break in and export the lot in plain text?

Jonadab March 13, 2012 6:19 PM

This is far better than passwords, which are
usually under 10 bits by this same metric

It also benefits FAR more from judicious selection.

When constructing a password out of characters, deliberately adding unusual characters adds a relatively small amount of security to the end product, because the total number of characters barely doubles (unless you start putting in multi-byte characters, which is still not as good as increasing the total password length by the relevant number of bytes).

When constructing a passphrase out of words, deliberately putting in unusual words (i.e., words that are not listed in small dictionaries, including the ones commonly used for password cracking) makes a MUCH larger difference to the security of the resulting password, because the number of possible words becomes thousands of times as large as it would be using only common words.

(Of course, when I need a passphrase to be particularly secure, I make at least one of the words up out of whole cloth. And all of this does not negate the need for rate limiting, which is so fundamental I cannot imagine why the authors are bothering to mention password systems that don’t have it. Frankly, secure systems use mandatory delays that grow geometrically with the number of failed attempts — although it’s a good idea to exempt repeated failures using the same wrong password from said increase.)

Daniel March 13, 2012 6:31 PM

Says Steven K: “The goal isn’t to reduce risk to zero. It is to make your asset harder to breach than all of your neighbors’ sites.”

No, that’s error. The goal is to make your password stronger than your attackers ability/resources to hack it. You’re attacker may have zero interest in what your neighbor possesses if what you have is desirable enough.

It’s fundamentally the difference between odds and risk. Risk is odd times loss. Passwords are a risk mitigation strategy, not an odds mitigation strategy.

Jonadab March 13, 2012 7:20 PM

I’m never sure what this “20 bits” means.

I think they’re referring to a password built from an amount of entropy equivalent to twenty bits independently selected at random in such a way that they are each equally likely to be 0 or 1.

The easy way to get that number is to take the log base 2 of the total number of possible distinct passwords that could have been generated. This easy calculation, unfortunately, only tells you the actual entropy in the password if the method for generating the passwords is sufficiently random that all of the possible outcomes are equally likely. This is seldom the case when a human chooses.

Even with a human choosing, though, the easy formula is still useful, because it gives you a quick upper bound — the password can never have more entropy in it than this calculation tells you. It can have less, if the person choosing the password does something at least partially predictable (like excluding all the words they don’t know — this allows the attacker to use statistics for how common various words are to decide which words to try first, greatly reducing the difficulty of guessing the password).

For example, if I have a spelling dictionary with
98569 entries (/usr/share/dict/words on Debian stable) and select ONE word from it using a cryptographically secure random process to choose which one, the result could in theory have at best, as an upper bound, sixteen bits of entropy — but in fact it’s worse than that, because some of the entries (e.g., the word “bed”) have considerably less entropy than that when considered as a string of random characters. If you cull such words from the dictionary list before selecting from it, the total size of the list is reduced…

Using a larger dictionary helps a lot. Consider, for example, an international Scrabble dictionary called SOWPODS, which contains over 245 thousand entries. If you cull from it everything that’s in the aforementioned smaller dictionary, that leaves more than 180 thousand entries (more than you would figure by simple subtraction, because the spelling dictionary has a lot of proper nouns in it). If you further remove everything shorter than eight letters long, that would still leave more than 135 thousand words. Naively, that comes to seventeen bits of entropy from one word; in practice it may be a little lower than that (some of the eight-letter words for example might have fewer bits than that when constructed randomly based on statistics of which letters are common), but it’s MUCH closer to genuinely having that amount of entropy than what you’d get using the 98 thousand words from the spelling dictionary. String several of these higher-entropy words together into a passphrase, and now you’re getting somewhere. Even if each word has only 15 bits of entropy, a string of three gives you 45, which is reasonably good for a password that protects anything worth less than seven figures to an attacker — assuming your OTHER security ducks are in a row as well (among other things I’m assuming that stored passwords are hashed and salted and that retries are rate limited, because it’s not 1992 any more).

Jonadab March 13, 2012 7:33 PM

I fail to see the advantage of a passphrase
over a random set of characters.

Mostly, it’s much easier to remember — for the same level of entropy.

Note that picking a common phrase, like “apples and oranges” or “I’ll be Bach”, is the same mistake as picking a common word, like “apple” or “awesome”. It has a bit more entropy (because there are more common phrases than there are common words), but still not very much. If you really want to see the difference, compare a password constructed of random characters versus a passphrase constructed of random words. According to my calculations, it only takes three random words to contain the same amount of entropy as eight random alphanumeric characters. A string of three (or even five) random words is significantly easier to remember than a string of eight random mixed-alphanumeric characters.

Jonadab March 13, 2012 7:49 PM

They also seem to assume that an attacker
knows whether a pass phrase or a password
is used. If that information is unknown, then
isn’t a 5-word pass phrase ridiculously
secure vs a brute force attack?

If the attacker doesn’t know whether you’re using a password or a passphrase, your passphrase is VERY SLIGHTLY more secure than it would be if they knew you were using a passphrase, but the difference provides only about one bit of additional entropy, so in practice it is not important.

However, a five-word passphrase generally contains MUCH more entropy than a five-letter password, because there are a LOT more than 26 words in the dictionary. In fact, a mere three-word passphrase contains a similar amount of entropy as an eight-character password.

I’m assuming here that the eight characters are chosen at random and similarly that the three words are selected at random from a medium-sized dictionary. If your password is “stupid” or “password” and your phrase is “I hate passwords” or “This is stupid”, the amount of entropy is much smaller.

Magnum March 13, 2012 9:01 PM

I use a Dvorak keyboard and have a lot of trouble typing a password in qwerty if I have to log on to another machine.

I could in theory (;-)) make my life easier by forming passwords from numbers, punctuation (shifted numbers), and the letters a, A, m, and M. These letters are the only ones whose position is the same on both a Dvorak and Qwerty keyboard.

I wonder if any automated password crackers have ever tried to exploit something like this..

Mark Currie March 14, 2012 2:17 AM

Like Godel, I have always wondered about the same thing – A much slower password hashing algorithm would go a VERY long way to compensate for low entropy passwords, so why are we not doing this?

I know that high-volume servers will take strain, but they have had to deal with slow digital signature calculations so why not passwords? Besides, there are plenty of low-volume servers that can afford the resources.

As for your own PC login and local passwords used to derive keys, etc., it is surely inexcusable. Here we are working at human-time so we could surely afford a second or two for the calculation.

Porlock Junior March 14, 2012 3:26 AM

@SB –

Another, and entertaining, source on the Eskimo snow business is the title essay of “The Great Eskimo Vocabulary Hoax and Other Irreverent Essays on the Study of Language” by Geoffrey K. Pullum. It goes into the whole history of the idea and of the way it grew from something not well thought out into a monster in which number of words is whatever number a writer hapens to feel like at the moment. It seems to go as high as 400.

Winter March 14, 2012 3:48 AM

I’m never sure what this “20 bits” means.

I always understood this as the log of the effective search space. However, there is another way to look at it.

Order all possible passwords/phrases on their likeliness, eg, 1, a, 2, … password,… etc. This is an infinite list. Estimate how many password/phrases are more likely, or more easy to guess, than yours.

Without knowing your password, this will be half of the effective size of the search space. So double this number to get the effective size of the search space.

The log (base) two of this number is this infamous bit number.

Winter March 14, 2012 3:59 AM

Inuit and words for snow.

I understood that Franz Boas studied the languages of the Inuit. He determined that there were 4 (?) families of Inuit languages.

He illustrated these families by showing the differences between the words for “snow” in these languages. Just like the word for “one” can illustrate the differences between Indo-European, Altaic, and Sino-Tibetan languages.

So, originally, this was not about Inuits and snow, but about different people originating from different migration events out of Siberia.

Someone March 14, 2012 6:29 AM

I wonder if the length of the words chosen has a significant effect on the entropy of the passphrase / word.

If you chose words like Antidisestablishmentarianism, Floccinaucinihilipilification or etc

or indeed non (pardon the pun) English words.

Remember multifactor authentication isn’t the answer:

Something you have, something you are and something you know.
or put another way
Something that can be stolen, something that can be cut off and something that can be tortured out of you…

A teacher March 14, 2012 8:06 AM

I also teach the use of passphrases and have been a big advocate of getting passWORD out of our security lexicon. The twist is that I teach the use of passphrases salted with a small handful of extra characters for each phrase used – thus making the resulting phrase not simply dictionary words. This seems to provide the right balance for most people.

Captain Obvious March 14, 2012 9:22 AM

@Someone

While longer words are less likely to be found in a given dictionary, they would occomplish the opposite of what you want, which is:

less effort for you, more effort for the machine

dictionary words are treated as a single unit in the attack, but you must type them all out.

A phrase with many small words is far more secure than a few really long words.

I also like to use personal (uncommon) misspellings and nonsensical jargon that would not likely be in anyone’s dictionary.

John P March 14, 2012 10:31 AM

The system used to run the test is weak. It is leaking information about the stored passwords and making them easier to guess than they should be.

The real story may be that Amazon should up the security on their PayPhrase system. So the researchers now know that “obtuse leaf Core muffle” is in use by some random amazon.com user to checkout. Can this information be used for nefarious purposes? I don’t think so.

vedaal March 14, 2012 10:35 AM

For those with a geeky sense of humor, try a memorable command line syntax of perl, python, pgp etc., e.g.
(it doesn’t have to be error-free, just memorable …)

i.e.

pgp +BATCHMODE -waste c:*.* -u boo -r foo | format c

John P March 14, 2012 10:35 AM

I want note here that Amazon PayPhrase <a href=”https://www.amazon.com/gp/payphrase/claim/whats-this.html rel=”nofollow”>has been shut down. Perhaps this is a result of the research mentioned in this post.

jacob March 14, 2012 3:00 PM

Pi xor’ed with e. ? I do like the syntax of a language other than english, perl, cobol comes to mind. Tolkien, klingon, Protoindoeuropean, sumerian. ok I’m just having fun now. 😉

martinr March 15, 2012 12:15 AM

That (previously mentined) XKCD http://xkcd.com/936/ seems realistic to me.

Assuming a vocabulary of 2048 words and choice of 4 random words from that vocabulary, you’ll get 44 bits of estimated security. For the human brain, it is only 4 items to remember,
and each item can be remembered with redundancy.

The real issue is whether the attacker can really come up with an optimized vocabulary (2^11 = 2048 words) or may have to use a conservative approach and a dictionary of 2^16 = 16384 or more words.

Per Thorsheim March 15, 2012 7:40 AM

Any attacker without any prior knowledge of its target will most likely attempt “simple” passwords, not passphrases.

If a security policy explicitly demands and enforces passphrases, and the policy is known to the attacker… then we can start talking about keyspace reduction and lowered entropy, as many comments are talking about.

Adding language variations (dialects), and writing your words as you pronounce them may aid in lowering the risk of successful cracking, while maintaining the usability of actually remembering your passphrases.

alice pretending to be bob March 15, 2012 2:00 PM

Why use English or some other, common language for the password or phrase? I’ve used some time in the past or present some pass phrases where there are mostly words in dialect of some language, combined with some other words of some other dialect of some other language, and joined together or delimited by means that make sense only to me.

Rudyx March 20, 2012 1:22 PM

Interesting … Since NT 4.0 have used 16 character passwords, change them every 30 days, do not write them down and do not even know what they are ! (imagine that for security). It was simple, I reached back to the old dot matrix pronter days … I simply use my keyboard as a dot matrix.

Tristan Harward October 1, 2016 6:27 AM

I came here wondering about the security of passphrases, after the fleeting thought that perhaps they can be more easily attacked due to being only three to five dictionary words.

To recap:

  1. The research on Amazon is bunk. “PayPhrase” was a user-chosen phrase in which that they actually asked users to think of common, related words. I can’t imagine something less random or less secure. It was also not designed to be a full password. It was a phrase you’d type to make a fast purchase after already being logged in. I’m not sure of the point of it, and I’m glad they discontinued it. In any case it has zero relevance to passphrases chosen randomly with diceware or a good generator.
  2. In short, using the comments here I’ve convinced myself that passphrases are secure. The search space of the dictionary is longer than the search space of each character; you could almost think of it like an extremely complex alphabet that has a hundred thousand letters. If you picked three letters randomly from that extremely complex alphabet, it would be equal in entropy as a dictionary of a hundred thousand words, because of the number of possibilities.
  3. Things like adding numbers in between words, or adding a symbol before and after the phrase, don’t do much to increase entropy and neither then security. The best way to increase the security of the passphrase is to increase the search space. Pick one word from a foreign language dictionary you know, for example. I’m not convinced making one of the words 1337 speak even helps much, because in effect it’s only one or a handful of character swap possibilities and therefore doesn’t add much to the search space; however practically it may be very effective if the two search spaces are never attempted together.

However at the point of arguing over character replacements in random passphrases, you’re likely already dealing with a good passphrase that you should just keep simple and use. The real problems are, as always, systems that don’t accept long passphrases, systems that enforce character rules that make passwords difficult to remember, and passwords shared across systems with varying levels of secure hashing and storage. Simply enough, password managers seem to be the ticket to manage this complex world of systems each with their own ideas of what security means.

Thanks for the great thread!

Leave a comment

Login

Allowed HTML <a href="URL"> • <em> <cite> <i> • <strong> <b> • <sub> <sup> • <ul> <ol> <li> • <blockquote> <pre> Markdown Extra syntax via https://michelf.ca/projects/php-markdown/extra/

Sidebar photo of Bruce Schneier by Joe MacInnis.