https://www.xkcd.com/936/
I've heard arguments against it. So is it still legit?
However, if someone implements a dictionary attack, doesn't that reduce the entropy of "correct horse battery staple" to effectively four?
No. The comic was already assuming a dictionary attack. You have to multiply the number of words by the number of bits of entropy per word. This is assumed to be 11 in the comic, which is what you'd get if you chose each word uniformly at random from a list of 2000 words.
The passwords generated by VeraCrypt are not the ones the comic is mocking. They're perfectly fine from an entropy standpoint, but problematic if you have to memorize them. It's a subtle but important distinction: the ones the comic is mocking are human-generated passwords made by manipulating words into looking more like VeraCrypt-style strings of random characters, without actually using a random number generator.
There are 26 letters in the English language. There are more than 170.000 words.
That's why the entropy is way greater with words, and thus fewer words are needed.
"The computer knows all the words"..? ...so? The computer doesn't know the letters?
Calculating entropy within xkcd 936: Password Strength - Cryptography Stack Exchange
theory - Password strength (XKCD) - Stack Overflow
Could someone explain the xkcd comic that's about entropy and cracking passwords?
passwords - Is "the oft-cited XKCD scheme [...] no longer good advice"? - Information Security Stack Exchange
I don't get nearly the amount of entropy stated in the comic.
Interestingly enough the reasoning for the entropy rating are actually justified in the comic by the little boxes which each represent 1 bit of uncertainty.
This means for Tr0ub4dor&3
- It's estimatated that the word itself "Troubador" comes up in dictionaries which contain about 216 words
- It adds one bit for each of
o,a,oof the word to encode whether the letter was replaced or not - It adds one bit to decide whether the word was capitalized or not
- It adds one bit for the ordering of the trailing numeral and special character
- It adds 3 bits for the unknown numeral, approximating 10 with 23 instead of 24 which is more accurate
- It adds 4 bits for the unknown punctuation, ie which of the approximately 16 standard ones it is
This sums up to 16+3+1+1+3+4=28
For correct horse battery staple the reasoning is that each of the four words is drawn from a dictionary of size 211 which means 4×11=44 bits of entropy.
In both cases it can be assumed that the attacker knows the possible choices influencing the entropy estimation and that it's actually a uniformly random decision which word / pick is done.
If you want an even more thorough explanation of this comic, I can only recommend you read the bear's answer on this over on InfoSec.SE.
One official way to estimate the strength of a user selected password such as "Tr0ub4dor&3" is to look at NIST recommendations. Granted that this is now deprecated, but the relevant publication was NIST Special Publication 800-63 Version 1.0.2, Electronic Authentication Guideline.
Table A.1 (reproduced below in case of link rot):-
The reasoning behind this table is within the document at § A.2.1 Guessing Entropy Estimate. NIST therefor estimates that the entropy is 33 bits if we interpolate for 11 characters and use dictionary and composition rules.
The difficulty of assessing the entropy of short sequences, particularly human produced ones is the take away from this question. The two current answers diverge in strength by a factor of 32. If we compare NIST's estimate to Blafasel's original query on 50 bits, the entropy diverges 131,072 times. NIST says of the above, "Readers are cautioned against interpreting the following rules as anything more than a very rough rule of thumb method". True.
Another take away is that very few sites will allow the stronger and easier to remember technique of choice from a word list, such as "correcthorsebatterystaple". The on-line version of the UK government doesn't, no bank I'm aware of does, and stackexchange.com doesn't.
There is a own wiki just dedicated to explain xkcd comics http://www.explainxkcd.com
The explanation for this particular comic can be found here: http://www.explainxkcd.com/wiki/index.php/936:_Password_Strength
first off (FYI) computer guesses are based of how common and how long a phrase is, but human memory is based on how complex and long something is.
Basically what this comic is making fun of is that common 'good password criteria' has made passwords more complicated but it has to be short to be remembered. But the 'good password' criteria has made more complicated sequences more common. This means that four common medium length words are less common and longer than normal meaning a better password. Basically what was rare(short complex patterns) is now common making what was common (med-long length phrases) rare.
http://xkcd.com/936/ <-- That's the comic I'm talking about.
Where are the extra bits coming from in the first password? Are they coming from the special characters?
What does entropy mean in this case?
I googled the definition: "Lack of order or predictability; gradual decline into disorder." Is that kinda what it means in this case as well?
If you could explain it like I'm a stupid person, I'd appreciate it.
The Holy War
I think you will find that the correct way to generate passwords could start a holy war where each group thinks the other is making a very simple mathematical mistakes or missing the point. If you get 10 computer security professionals in a room and ask them how to come up with good passwords you will get 11 different answers.
The Misunderstanding
One of the many reasons there is no consistent advice about passwords is it all comes down to an issue of threat modeling. What exactly are you trying to defend against?
For example: are you trying to protect against an attacker who is specifically targeting you and knows your system for generating passwords? Or are you just one of millions of users in some leaked database? Are you defending against GPU based password cracking or just a weak web server? Are you on a host infected with malware[1]?
I think you should assume the attacker knows your exact method of generating passwords and is just targeting you.[2] The xkcd comic assumes in both examples that all the details of the generation are known.
The Math
The mathematics in the xkcd comic is correct, and it's not going to change.
For passwords I need to type and remember I use a python script that generates xkcd style passwords that are truly random. I have a dictionary of 2^11 (2048) common, easy to spell, English words. I could give the full source code and a copy of my list of words to an attacker, there are still going to be 2^44 possible passwords.
As the comic says:
1000 Guesses / Sec Plausible attack on a weak remote web service. Yes, cracking a stolen hash is faster, but it's not what the average user should worry about.
That strikes a nice balance between easy to remember and difficult to crack.
What if we tried more power?
Sure 2^44 is ok, but GPU cracking is fast, and it's only going to get faster. Hashcat could crack a weak hash[3] of that size in a number of days, not years. Also, I have hundreds of passwords to remember. Even xkcd style it gets hard after a few.
This is where password managers come in, I like KeePass but there are many others that are basically the same. Then you can generate just one longer xkcd pass-phrase that you can memorize (say 10 words). Then you create a unique 128-bit truly random password for each account (hex or base 64 are good). 128-bits is going to be strong enough for a long time. If you want to be paranoid go larger, it's no extra work to generate 256-bit of hex passwords.
[1] This is where the memory thing comes in, if you're on a compromised host you have lost. It doesn't matter if you type it or use a program like KeePass to copy and paste it if an attacker can key-log / read memory.
[2] Rather than the weaker (but more likely) assumption that the attacker has just torrented a dictionary called "Top Passw0rdz 4realz 111!".
[3] Sure we should all be using PBKDF2, etc... but lots of sites are still on SHA1. (and they are the good ones)
Schneier writes this:
This is why the oft-cited XKCD scheme for generating passwords -- string together individual words like "correcthorsebatterystaple" -- is no longer good advice. The password crackers are on to this trick.
but the key to understanding what he is really after is a little further in his essay:
There's still one scheme that works. Back in 2008, I described the "Schneier scheme"
so that's it. Ole' Bruce wants to assert that his scheme is the One and Only, the best, the winner, the ultimate scheme. Therefore, he needs to say disparaging things about the "competitors", regardless of whether such assertions are scientifically sound or not.
In this case, it has always been assumed that the password generation method is known to the attacker. That's the whole point of entropy computations; see the analysis. That attackers are "on to this trick" changes nothing at all (when an attacker knows the password generation method, the entropy computation describes exactly the password strength; when the attacker is incompetent and does not know the password generation method, the password strength is only higher, by an amount which is nigh impossible to quantify).
The quip about "passwords in memory" is just more incoherent ramblings. Passwords necessarily go to RAM at some point, whether you type them or copy-paste them from a password safe, or anything similar.
My guess is that Bruce was drunk.
Update Schneier was specifically asked to comment about his passphrase condemnation in a Reddit AMA (via archive.org, original link) that took place August 2, 2016. He continued to advocate for his password creation system as a superior alternative to random word passphrases. Schneier did say his scheme "gives you more entropy per memorizable character than other methods" which is true when compared to characters making up words. But this is also irrelevant when a system relies on memorizing words rather than characters, and you're allowed to combine enough words to generate adequate 'entropy' for your passphrase as a whole.