Friday Squid Blogging: Draw-a-Squid Contest

Draw a squid, win Jeff Vandermeer`s Ambergris novels.

Posted on October 23, 2009 at 4:03 PM • 8 Comments

Comments

pdf23ds • October 25, 2009 12:52 AM

I’m starting to use TrueCrypt, and I want to use a good strong password. According to Wikipedia, the NIST recommends using a password with 80 bits of entropy for the strongest applications, so I’m going with that. Bruce, what’s your recommendation for entropy?

Now, a lot of people would just go with a password generated from random upper/lowercase letters, numbers, and symbols typeable on a regular keyboard. Assuming a uniform chance of each character being in a password, you would need 12.1 characters to reach 80 bits. (With 96 symbols, that’s 80 / log_2 96.) If that’s a little hard to type/remember for you, you could always use only lowercase letters, for 17 characters.

But I’ve looked into different ways to generate passwords. According to the wiki article on entropy, you can calculate the entropy of a random string where each symbol has a given probability of occurring. (This is the sum of [the probability of a symbol times the logarithm of its probability] for each symbol.) The article on letter frequency gives the probability of each letter in various languages. Using that data, I’ve calculated that a random string of lowercase letters drawn from a distribution matching English’s letter distribution will have 4.18 bits of entropy per character.

This gives a grand total of 19.1 characters to reach 80 bits of entropy. Surprisingly, using letter frequencies didn’t really change the required length much at all. These are probably a little easier to type (especially on the Dvorak layout), since they use common keys more often.

Here are a few examples.

mbbtfxbsbjdptjjfjct
tpueujueuoccbtqvbfp
mopsjsdsluudojzoeji

An even more advanced method is to use a table of common English trigram frequencies. This generates often-pronounceable, quite memorizable passwords. They have to be, on average, 37 characters long to be 80 bits strong (35 characters if you generate without spaces), so maybe not worth it to everyone. A few examples:

con lins not numbeen thernite yount
proulace of thouriesecals ris arth
rs the steraind and the mand new has wh
ble s justaintichand cerive of win

[Standard disclaimer about PRNGs.]

http://en.wikipedia.org/wiki/Entropy_(information_theory)
http://en.wikipedia.org/wiki/Letter_frequency
http://home.ccil.org/~cowan/trigrams

pdf23ds • October 25, 2009 1:26 AM

Diceware, in comparison, manages 4.2 bits of entropy per character, for an average of 26 characters per 80-bit password. So it would seem to be clearly superior to the latter method.

pdf23ds • October 25, 2009 1:41 AM

A sometimes recommended way of generating passwords is to pick a whole phrase and use the first letters of each word. By my calculations, you need between 12 and 26 characters to reach 80 bits by this method. Shannon’s estimates of the entropy in English are between .6 and 1.3 per letter, and the average English word is 5.1 letters long[1]. 80 / (5.1 * .6) is 26.

[1] http://blogamundo.net/lab/wordlengths/

And of course, all these estimates are rather conservative. The most obvious thing they don’t take into account is the uncertainty on the part of the attacker about which method you chose to choose your password. Since this is very difficult to estimate, it’s conservative to assume they know exactly how you picked, but it would be interesting to see estimates.

Clive Robinson • October 25, 2009 1:48 AM

@ pdf23ds,

As a very rough rule of thumb “memorable” passwords have around 1.5 bits of entropy per char (less for well known sayings or phrases).

This is not just due to letter and binim trinim etc letter frequency but other things such as consonant vowl (CV) patterns and rules of spelling and word type and usage.

There is also the assuption that an attacker “knows the system” which again lowers the entropy.

So at 1.5 bits of entropy you would be looking at atleast 54 chars of memorable natural language…

pdf23ds • October 25, 2009 1:55 AM

It’s interesting to compare Shannon’s estimate of around 1 bit/character to the current state of the art in English language compression, found at Marcus Hutter’s compression challenge site:

http://prize.hutter1.net/

The best compressor as of this writing compresses a 100 MB extract from Wikipedia at a rate of 1.3 bits per character. This implies 1.3/char as a maximum entropy of Wikipedia text. So it’s likely considerably lower. 1.0? .8? Who knows?

pdf23ds • October 25, 2009 4:28 AM

BTW, the state of the art at the start of the Hutter prize was just under 1.5 bits/char, for a total improvement so far of .19 bits/char.

Clive Robinson • October 25, 2009 1:01 PM

@ pdf23ds,

“The best compressor as of this writing compresses a 100 MB extract from Wikipedia at a rate of 1.3 bits per character.”

Yup the entropy per char does drop the larger the block of “natural language” you give it (as repeated long strings become more likley).

And in the likes of a newspaper or encyclopida they are going to be worse than similar size agrigates of unrelated plain texts.

In the case of a newspaper most of the articles will only relate to a few stories so you might well have a front page item and two or more analysis items on the same story further in.
Each artical will have entropy/char similar to natural language. However articals on the same story will obviously share a lot more than unrelated plain text so the entropy/char will be lower.

You would expect a similar effect in a “styalised” collection of articals such as an encyclopidia etc.

However this effect can be seen in as little as two sentances,

“If the cat can sit on the mat? Can the cat lie on the mat?”

Which I remember from one of my son’s early reading books.

slaybalj • October 26, 2009 8:58 AM

sweet I’ve actually read City of Saints and Madmen, and it’s on record as the most bizarre book I’ve ever read. It’s actually a bunch of short stories.

Interestingly, in the first edition of the book, the final short story was encrypted with a simple page, paragraph, word code using the rest of the book. Later editions had this text decrypted rather than retranslating it after the page breaks changed – though the last paragraph or three is still intact and has to manually decrypted by the reader. (Or google searched).

Schneier on Security

Friday Squid Blogging: Draw-a-Squid Contest

Comments

Leave a comment Cancel reply