Schneier on Security
A blog covering security and security technology.
« Fear and Overreaction |
| Mossad Hacked Syrian Official's Computer »
November 5, 2009
The Problems with Unscientific Security
From the Open Access Journal of Forensic Psychology, by a whole list of authors: "A Call for Evidence-Based Security Tools":
Abstract: Since the 2001 attacks on the twin towers, policies on security have changed drastically, bringing about an increased need for tools that allow for the detection of deception. Many of the solutions offered today, however, lack scientific underpinning.
We recommend two important changes to improve the (cost) effectiveness of security policy. To begin with, the emphasis of deception research should shift from technological to behavioural sciences. Secondly, the burden of proof should lie with the manufacturers of the security tools. Governments should not rely on ecurity tools that have not passed scientific scrutiny, and should only employ those methods that have been proven effective. After all, the use of tools that do not work will only get us further from the truth.
In absence of systematic research, users will base their evaluation on data generated by field use. Because people tend to follow heuristics rather than the rules of probability theory, perceived effectiveness can substantially differ from true effectiveness (Tversky & Kahneman, 1973). For example, one well-known problem associated with field studies is that of selective feedback. Investigative authorities are unlikely to receive feedback from liars who are erroneously considered truthful. They will occasionally receive feedback when correctly detecting deception, for example through confessions (Patrick & Iacono, 1991; Vrij, 2008). The perceived effectiveness that follows from this can be further reinforced through confirmation bias: Evidence confirming one's preconception is weighted more heavily than evidence contradicting it (Lord, Ross, & Lepper, 1979). As a result, even techniques that perform at chance level may be perceived as highly effective (Iacono, 1991). This unwarranted confidence can have profound effects on citizens' safety and civil liberty: Criminals may escape detection while innocents may be falsely accused. The Innocence Project (Unvalidated or improper science, no date) demonstrates that unvalidated or improper forensic science can indeed lead to wrongful convictions (see also Saks & Koehler, 2005).
Article on the paper.
Posted on November 5, 2009 at 6:11 AM
• 33 Comments
To receive these entries once a month by e-mail, sign up for the Crypto-Gram Newsletter.
Replace "behavioural" with "behavioural and social" and you have a winner.
Apparently, historically the worst has been arson forensics. It appears that Texas killed a guy (Cameron Willingham) 6 years back on a combination of witch-doctor style arson investigations and police "intuition" and witness leading.
So, damn right, it does lead to a loss of security for citizen -- all the way up to murder.
Of course, no one will ever be prosecuted for murdering Willingham -- even though, if it happened with this guy, it's reasonable to suspect that we have a pack of serial killers on the loose in Texas.
Psychologists doing research to suggest that we need more psychology-based science in our security systems? In other news, computer scientists are doing research that suggests we need more computer-based science in our security systems, and biologists are doing research to suggest we need more biology-based science in our computer systems.
Where are the electrical engineers suggesting we need more electronic control over our security systems, and the mechanical engineers offering mechanical solutions for our security systems?
I used dowsing to find water and, hey, it worked! *
* I was on a boat at the time.
It's nice to see the authors cite the 2003 NRC study of polygraph effectiveness, which was a tour-de-force, a thorough debunking of the nimbus of pseudo-science that enfolds polygraph machines, vendors, techniques, practices, and policies.
Unfortunately, the fate of that report allows us to easily predict the impact that this paper will have: zero. The problem with the NRC report was that too many officials in the securocracy are thouroughly invested in the sort of magical thinking required to believe that polygraph screening is an effective defense against treason/spying/disloyalty/corruption. The report was too disruptive to their belief system, so they tossed it in the trash can and went right on with polygraph screening as a standard procedure.
The same mindset drives the adoption of the magical deception-detection tech decried in this paper. It's too entrenched to root out. As with the polygraph report, law-enforcement, security, and intelligence officials will feel their world-view threatened by whatever extent this paper is taken seriously, and will retort that academics in ivory towers and propeller-headed quants can't possibly understand the nuances of interrogation and screening that give them confidence in these systems.
It's bullshit, of course. A romantic, anti-science view by group-thinking bureaucrats collectively afflicted by a colossal deficit of intellectual integrity. But one that looks ineradicable.
> Since the 2001 attacks on the twin towers,
> policies on security have changed drastically,
> bringing about an increased need for tools that
> allow for the detection of deception.
What would happen if the obscureaucrats and secureaucrats turned those deception-detecting tools inward?
> historically the worst has been arson forensics.
CSI Myths: The Shaky Science Behind Forensics
Forensic science was not developed by scientists. It was mostly created by cops, who were guided by little more than common sense. And as hundreds of criminal cases begin to unravel, many established forensic practices are coming under fire. PM takes an in-depth look at the shaky science that has put innocent people behind bars.
By Brad Reagan
Published in the August 2009 issue.
A polygraph is a diagnostic tool. It doesn't register "LIES" "TRUTH," it registers various stress factors on your body that an analyst can use to identify anomalous behavior. If your palms start sweating and your heart beats faster when we touch a specific subject, you're obviously anxious about something there; why? Old girlfriend giving you painful memories? Got fired over something strongly related? Got something to hide?
“electrostatic magnetic ion attraction,” ... That's a funny way to spell snake oil.
This paper talks about physical security and law enforcement tools etc, but it sounds like it all applies just as well to InfoSec.
We have no shortage magical thinking a lot closer to home. If we want to talk about non-evidence based policies we could start with all the password related nonsense that we've never had the guts to address.
Do we in InfoSec do any better than govt securocrats on the evidence front?
Hi Bruce, I'm so impressed by this phrase:
"In absence of systematic research, users will base their evaluation on data generated by field use. Because people tend to follow heuristics rather than the rules of probability theory, perceived effectiveness can substantially differ from true effectiveness (Tversky & Kahneman, 1973)"
It is exactly what I'm trying to teach to my colleages. And recently we had a bitter discussion on password strength, and I'm wondering if you know some research about password strength metrics.
For example they stablish that www.passwordmeter.com is a good metric, but I say that the quantity of good passwords (of 8 characters) is so small and could be computed fastly.
"Forensic science was not developed by scientists. It was mostly created by cops, who were guided by little more than common sense."
Err "common sense" had little to do with it, and it is still abscent in most forensics today.
For instance finger prints, there is little if any science involved (one of the main reasons it's been so hard to computerise) it is at best an 'art' in which a person uses their supposed 'best judgment' to 'offer opinion' to a court on if the distorted ink print of the suspect matches what is usually a partial and likewise distorted pattern of protien and greese that has be "dusted" with aluminium powder or some chemical reagent.
Importantly in most cases the tribunal of truth (the jury) does not see either image in a way they can make judgment as to if they agree with the 'opinion' or not...
More importantly the equipment in use these days is more sensitive than the normaly expected background levels or "noise" so the chances are they will indicate (correctly) that the checmical is present.
But importantly none of the equipment can say how a substance came to be at a scene or on a suspect or their cloths or other posecions.
Again a suposed expert offers an opinion which is usually little more than X contains substance Y and Y was found on the swabs of the suspects hands. It is often expressed in such a way as to make the jury members think X is the most probable source of Y.
For instance a statment such as,
"The swabs taken from the suspects hands showed traces of amonium nitrate. Which indicates the suspect had prior to their arrest handled amonium nitrate. The home made explosives used in the bomb where amonium nitrate based."
Sounds convincing till sombody points out that cured meats such as bacon have fairly high levels of amonium nitrate, and that ordinary house hold cleaning agents when mixed on a cleaning cloth may well produce amonium nitrate, or that it is produced as part of the natural break down of organic materials such as would be found in a garden refuse / compost bin or in animal waste products, oh and many brands of tabbaco product and the papers used to make smokes of various forms.
I suspect that if you tested 100 people at random and swabed their hands that the majority would show positive for amonium nitrate or cocaine or both at the levels of sensitivity the equipment can detect...
Even if all that can be ruled out (which is very doubtful) there is then the issues of cross contamination.
A forensic worker simply coughing may well be enough to cause samples of things they have handled (or smoked) to get into the air and onto the swabs or other items such as clothing.
The fact that something is present at the "scene of the crime" and on the "suspect" does not actually tell you anything other than that. It in no way indicates that the two where ever together. Even in the case of very rare substances all it indicates is that there might be a causal link between the two but not what it is. At low levels it is just as likley to have occured due to a third party or item as it is to have happened by direct contact at the "scene of the crime".
That aside there are some techniques that do stand up to scientific scruitiny such as those involving mechanical marks from tools, teeth and to a lesser extent other objects.
@damian "research about password strength metrics."
You might check out "NIST Special Publication 800-63, Electronic Authentication Guideline" Appendix A has an interesting article on estimating password strength based on entropy.
"For example they stablish that www.passwordmeter.com is a good metric, but I say that the quantity of good passwords (of 8 characters) is so small and could be computed fastly."
Interesting site. But has serious problems.
I tried a few of my passwords on it. Initially they were rated as "very strong" (88%)... until I put in two identical characters. At which point they became "very weak". And they finally ended at 0%.
How can adding additional characters WEAKEN a "very strong" password?
The question of how to measure strength is an interesting one. But more fundamental is: where is the evidence showing that people with weak passwords are getting hacked?
Password strength and rules is probably the single biggest annoyance for users, and with a proper lockout policy it just doesn't seem necessary. Is there evidence to contradict the view that mediocre passwords are fine if there's no offline attack?
And adding 2 letters:
See, much weaker.
"See, much weaker."
No, I don't see that.
You're trying to make the case that adding characters can make a password easier to crack by getting it on a dictionary list.
That would only be the case where the variations in punctuation and capitalization of the dictionary list were FEWER than the variations available in the smaller password.
And on my keyboard I have 94 easily typed characters.
Several comments here reveal the authors' ignorance of polygraph testing. If a test is done prorperly (i.e. with a single-issue test - preferably the Utah Zone of Comparison Test) accuracy will be somewhere in the mid-90% range. The report of the National Academy of Sciences acknowledges this. Before attacking a useful and scientifically valid technique, people should do a little reading.
Louis Rovner, Ph.D.
@ Clive Robinson
"... there are some techniques that do stand up to scientific scruitiny such as those involving mechanical marks from tools, teeth ..."
Well, the PM article linked upstream notes that techniques involving mechanical marks from tools, teeth, etc are very poor.
@ Brandioch Conner,
"How can adding additional characters WEAKEN a "very strong" password?
Over and above Boba Fett's example.
It may be the way the measurment is made.
For instance it might be expressing it against the total length of the password.
So say you have a password that contains only what the system thinks are high entropy charecters, if you then add in two 'e' chars you have increased the potential password space quite considerably (say equivalent to adding 13bits) but you have only added maybe 1-1.5 bits of entropy.
After all the most common method of estimating entropy is rated on how frequently a charecter is used. It is therefore possible for a very rarely used char to have more bits of entropy than the number of bits required to represent it.
For instance the tild (~) charecter almost never appears in ordinary plain text so it might appear say once in every 125,000 chars so effectivly is a 1 in 2^17 probability, but actualy only needs 7 bits to represent it.
Which raises the question of how do you measure entropy for passwords.
The first problem with using character frequency is "context" of the sample texts used to colate it.
For instance the forward slash (/) is not common in ordinary english text, but in a well commented C++ program it is very common so would have considerably lower entropy in that context.
In plain text english without capitalisation issues you would find the expression,
EAT ON IRISH LID
an easy way of remembering the order of frequency (some lists sometimes reverse AT or RS depending on the source material and age).
"notes that techniques involving mechanical marks from tools, teeth, etc are very poor."
That is true in that it depends a lot on the materials involved.
However the point I was making is that the testing methods do stand up to the basic requirments of the scientific method in such areas as repeatability etc. The fact that the output has a low level of accuracy is another issue.
In the not to distant past we used blood types, the method of determining them is a well established procedure, the fact that the result tells you very little (unless they don't match) does not effect the scientific credability of the tests.
There are however a number of forensic methods that have so little science behind them they tend to make the Victorian ides of "criminal type ear lobes" look more credable.
Importantly some destroy the evidence they are testing or render the evidence unavailable for other tests. This is highly undesirable as the test results cannot be chalenged in an apropriate manner.
The thing is that in many many court cases the forensic evidence is used as a "show item" to lend unwarented credence to the procedings. The prosecution get away with this as judges generally do not like the word of "expert witnesses" to be called into doubt on "technical argument" as it supposadly confusess the "jury" (though they rarely mention the "judge").
We have seen to many supposed experts use their or somebody elses "pet theory" to find inocent people guilty by suggestion of crimes that if true would be quite appaling and thus get harsh sentancing (death of infants for instance).
"After all the most common method of estimating entropy is rated on how frequently a charecter is used. It is therefore possible for a very rarely used char to have more bits of entropy than the number of bits required to represent it."
Yeah. You might want to remember that we're talking about passwords that can be typed on a keyboard.
At which point it only comes down to the character set that is being used (all lower case, lower case and upper case, etc) and whether it can be guessed on a dictionary list or via other algorithm.
So far, you've been unable to demonstrate how adding characters to a password makes it easier to defeat.
And no, in this discussion the rate of slashes in a C++ program does not matter.
@ Brandioch Conner,
"So far, you've been unable to demonstrate how adding characters to a password makes it easier to defeat."
Hmmm, I thought you where asking the question with respect to the rest of your post about www.passwordmeter.com not in the more general sense. Which is why I said at the top of my reply,
"It may be the way the measurment is made."
Which also by the way applies to cracking passwords, if you are not just using a brut force search (in which simple case the longer the password the stronger it is).
Or to put it the other way around how easily can it be determined in less than a brut force search?
Which brings you around to saying "how predictable the password is".
This is a little like asking how big a cloud is, you first need to agree on how you are going to measure it.
When measuring a password's strength there are many ways you can do it. As it is supposed to be a "secret" one way is on how easy it is to remember or not.
As humans tend to remember by association you could say at one end of the measure is unmemorable (ie written down) or easily memorable like the users name.
So at one end no patterns thus not predictable (ie random) to being easily predictable (a known word).
Bob Fett pointed out how something that looks random becomes very non random with the addition of a couple of letters.
I pointed out how it mignt be determined through the use of "letter frequency", but pointed out it was "context dependent".
You might decide that as it is a short string you might use "QWERTY keyboard layout with a righthanded user" as the context. Thus adding two 'g' charecters would add little or no strength whilst adding two 'q' charecters would add considerably more strength.
However giving the user the result as a percentage says the "strength" has been "normalised".
Which was another point I was making. That is if it is normalised against the length of the password then yes you would expect to see the strength go down, even though the length has gone up.
So to answer your question in anything other than general terms you need to say not just how you are measuring it but also how you are going to normalise the result. Otherwise this posting will ping pong back and forwards in an off topic way which the moderator will quite rightly put a stop to.
It amuses me that the title of this thread is "The Problems with Unscientific Security" and yet the postings within it fail to address anything scientific.
"Bob Fett pointed out how something that looks random becomes very non random with the addition of a couple of letters."
No, he did not. There is no evidence that his second password would be cracked any faster than his first password.
Which is exactly what I pointed out with regards to that website.
@ Brandioch Conner,
"It amuses me that the title of this thread is "The Problems with Unscientific Security" and yet the postings within it fail to address anything scientific."
Yup it has a certain degree of wry wit.
The simple fact is outside of "brut force" which is just as determanistic as a counter, password cracking is more "art" than "science". And thus the argument about password strength drops to philisophical musings.
It's why I keep going on about context.
The "art" is based around the recognition of human failings (to remember entirly random charecter strings). And the human brains way (association) of remembering things.
All the attacks that are less than "brut force" basicaly look for things that are meaningfull to humans such as dictionary words etc first. Then if that fails either hit random guessing or back to "brut force".
The question is what is meaningfull to any individual?
Which is where the context comes in. Some of it is general such as using common words in the likley language of the account holder, some using knowledge of the person (football fanatic, SciFi fan, etc).
If you believe that "psychology" can help you determin the likley way a user has "made their password" then you could argue there is science involved. But then there is the argument that psychology is a bit "touchy feely" at the best of times and subject to hugh dollops of interpretation by the observers...
Likewise if you have spent time "snooping" passwords in a certain group (say students) then you could have a statistical advantage (so mathmatics not science ;)
The only real change in the game of recent times (outside of using graphics cards) are "rainbow tables" where people have done the "time expensive" calculation to produce a lookup table, and then made it publicly available.
The other way to short cut is to "guess the salt" or "padding / nonce".
There is a very real posability that standard methods of generating a "salt" may not be realy random at all. And may be to determanistic or have bias (and this has been found to be the case a number of times).
Further "generated passwords" that are supposadly "memorable" use word patterns (such as CVCCVCCC etc) are realy not that good an idea, because they are a very limited subset of the available password space for any given length, and thus subjet to dictionary attacks as well as random number generator attacks.
However there is another angle of attack which could be exploited which is collisions and other problems with storage of the encoded password...
There are a very large number of "assumptions" with "one way functions" such as "hashes". We have seen various hashes fall to various types of attack in recent years and to be honest I see no reason why these attacks will not continue against newer hashes etc (have a look at the EU NESSIE project and the "stream ciphers").
The problem with passwords is that some are effectivly "forever" with things like password safes. In that a user thinks up one good password and uses it as a master to encipher all the acount passwords.
If I have managed to get a copy of your old password safe file that used an obsfication function that is now known to have exploitable weaknesses, I might be able to find your master password. What are the odds that "Jo Average" still uses the same password but with a more uptodate password safe?
Taken on mass humans tend to have simmilar failings one of which is we are "creatures of habit", and this will nearly always give rise to the posability of a short cut method producing dividends on any given multiuser system with sufficient users..
On the flip side I heard a comment about what is or is not "art" the other day. Somebody asked if something they had done was art or not. The answer that came back was, "well if it serves no functional purpose then it's probably art"...
"The simple fact is outside of "brut force" which is just as determanistic as a counter, password cracking is more "art" than "science". And thus the argument about password strength drops to philisophical musings."
Calling it an "art" is admitting that you do not understand it enough to describe it mathematically.
And if you cannot describe it mathematically, then you cannot show that it is more or less secure.
Regardless of any theoretical cases in which adding characters could reduce the strength of a password, the rules that passwordmeter.com uses are printed on the page, and they're blatantly stupid.
You get a linear bonus for password length, and a quadratic penalty for ANY repeated character, no matter what it is, even if the instances of that character are not consecutive. But even more importantly, it looks like the implementation is bugged, and penalizes this even harsher than they claim.
For example, they rate:
"zo3N9(*;na38" at 100 points (their maximum)
"zo3N9(*;na38888" at 0 points (thier minimum), even though if you add up the individual bonuses and penalties listed on the page, they still sum to well over 100.
You can rearrange those 8's anywhere in that string (e.g. "8zo83N98(*;na38") with essentially the same result.
I'd say that's a pretty bad rule even if it weren't bugged, since adding random charcters from the pool of characters already used may not increase the entropy as much as other additions, but it certainly won't *reduce* the entropy unless it subsumes the entire string into some larger pattern. Making sure you have no duplicate characters anywhere in your password actually makes it weaker, since you reduce the keyspace.
But even if the rule were theoretically sound, the penalty actually being applied is wildly in excess of what the page claims it's doing. It looks like it's applying that quadratic formula to the number of identical characters in the string, then applying that same formula again to its own result, giving a O(n^4) penalty for any repetition, which is just ludicrous.
@ Brandioch Conner,
"Calling it an "art" is admitting that you do not understand it enough to describe it mathematically."
Err no, not at all.
Mathematics is not a science it has no physical embodyment in any form. It is based on assumptions that are thought but unproven to be true. It is used for modeling the behaviour of physical objects. And a subset of it deals with probability.
Cracking a password can be achived in one of two ways. Firstly use the process in the forwards direction with trial inputs checking the processed trial against the result for the password. Secondly to take the result for the password and work it backwards through the process to get the original password.
If and only if, the assumption about the "one way"ness of the processing function in use is valid, then the second method is not realisable.
Which gives you only the first method of trial inputs.
Now it needs to be noted that finding a trial input that matches the processed password is not finding the password.
That is more than one trial input may after being processed match the processed password.
The strength of a password is not a mathmatical concept it is a human emotional concept based on the brains ability to make sense of equiprobable strings of charecters.
The "brut force" search simply starts at a point and works it's way through input trials untill a match is found (or the search is halted).
Depending on where you start your search and in what order determins how long the search will take.
If you start at a "random" position then there is a small probability you first or last trial will match the processed password. If you take the mean search time over a large number of "random" processed passwords you would expect the result to aproach about half of the total search time.
However passwords that are not written down are usually not random as the human brain usually cannot remember even a moderate string of unrelated charecters.
That is the actual password will be one of a very small subset of all possible inputs that make sense to a particular individuals mind.
You cannot mathmaticaly model an individual human mind for various reasons.
However you can by observing many minds determin certain similarities. Which give rise to some passwords being more likely than others.
Thus most passwords fall into a group that has some pattern that can be recognised.
However what pattern defines the most likely subset of all possible inputs which the password might be in?
If you "guess correctly" then the number of trials you have to make will be a lot lot less than the number of possible inputs.
If you "guess incorectly" then you will need to go through those inputs that are not in your chosen input subset.
It is chosing the right "subset" that decreases the number of trials required.
Which is why a password can be said to be weak if it falls into a known subset such as a dictionary word.
But not being in that subset does not make it strong (it could be just a 'z' on it's own for instance).
The subset is defined by the context you use such as "English dictionary", "French dictionary". Thus two different context subsets may have overlaps.
So the definition of a weak password could be "one that falls in one or more subsets of known pattern contexts"
The reason it is weak is because a "known pattern context" is more likley to be tried before a sequential or random search is tried to crack a password. Thus it is more likley to be found more quickly.
"Err no, not at all."
You are wrong.
"Mathematics is not a science it has no physical embodyment in any form. It is based on assumptions that are thought but unproven to be true. It is used for modeling the behaviour of physical objects. And a subset of it deals with probability."
Try sticking to the subject at hand, okay?
I can tell you exactly how many attempts it would take to brute force any particular key space.
And I can demonstrate that mathematically. Whether you agree with that or not.
@ Brandioch Conner,
"I can tell you exactly how many attempts it would take to brute force any particular key space."
That is not the subject at hand, as you are well aware.
All you can show with that bit of mathmatics is an upper search bound on an assumed password length nothing more.
It in no way describes why a dictionary attack would succeed way way faster than a brut force search if the user used a dictionary word.
The "art" I was refering to was the selection of search method to minimise the search space to below brut force. As it relies on "guessing how" a password was constructed mathmatics can only give you a probability of success.
The answer to your original question about the web site has been answered above if you care to re-read this page.
HJhon has told you he is not going to have further "nit picking" with you.
And I said further up this page,
"Otherwise this posting will ping pong back and forwards in an off topic way which the moderator will quite rightly put a stop to."
Based on your previous behaviour to myself and others.
It appears that you enjoy finding disagrement to provoke "nit picking" argument and continuously move the goal posts and fail to answer questions asked of you in a reasonable way.
I will therefore cease to respond to you on this blog page before the moderator closes this page or issues a warning.
@Brandioch Conner and @Clive Robinson,
I've read all the conversation about the philosophical problems about defining a mathematical criteria about password strength. All I want to say to you is that there is some mathematical research about this topic using a Hidden Markov Model.
There is an implementation in the John The Ripper project about this:
I hope that both of you can read the paper and try the tool. Tell me about your impressions.
Schneier.com is a personal website. Opinions expressed are not necessarily those of BT.