However, if someone implements a dictionary attack, doesn't that reduce the entropy of "correct horse battery staple" to effectively four? No. The comic was already assuming a dictionary attack. You have to multiply the number of words by the number of bits of entropy per word. This is assumed to be 11 in the comic, which is what you'd get if you chose each word uniformly at random from a list of 2000 words. The passwords generated by VeraCrypt are not the ones the comic is mocking. They're perfectly fine from an entropy standpoint, but problematic if you have to memorize them. It's a subtle but important distinction: the ones the comic is mocking are human-generated passwords made by manipulating words into looking more like VeraCrypt-style strings of random characters, without actually using a random number generator. Answer from Cosmologicon on reddit.com
🌐
xkcd
xkcd.com › 936
xkcd: Password Strength
~28 bits of entropy 2^28 = 3 days at 1000 guesses sec (Plausible attack on a weak remote web service. Yes, cracking a stolen hash is faster, but it's not what the average user should worry about.) Difficulty to guess: Easy. [[A person stands scratching their head trying to remember the password.]] Person: Was it trombone?
Standards
This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License · This means you're free to copy and share these comics (but not to sell them). More details
Exploits of a Mom
This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License · This means you're free to copy and share these comics (but not to sell them). More details
Earth Temperature Timeline
This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License · This means you're free to copy and share these comics (but not to sell them). More details
Ten Thousand
This work is licensed under a Creative Commons Attribution-NonCommercial 2.5 License · This means you're free to copy and share these comics (but not to sell them). More details
🌐
explain xkcd
explainxkcd.com › wiki › index.php › 936:_Password_Strength
936: Password Strength - explain xkcd
Example of an actual crack for this type of password: https://github.com/koshippy/xkcd_password/blob/master/password_crack.py My computer gets 10,000,000 guesses in ~16 seconds (non-hashed takes ~2 seconds), meaning it would take almost a year to try every combination. (2048^4 total password space). Even optimizing by using c++/java or JtR, you wouldn't see huge improvement since most of the time is from the SHA hashing.
🌐
Unix-ninja
unix-ninja.com › p › your_xkcd_passwords_are_pwned
Your xkcd passwords are pwned
To put this in perspective, the ... up of random ASCII characters. Although the concept is fair, this comic's implementation is flawed for achieving its goal. Cracking xkcd passwords is easier than you think....
Top answer
1 of 10
236

The Holy War

I think you will find that the correct way to generate passwords could start a holy war where each group thinks the other is making a very simple mathematical mistakes or missing the point. If you get 10 computer security professionals in a room and ask them how to come up with good passwords you will get 11 different answers.

The Misunderstanding

One of the many reasons there is no consistent advice about passwords is it all comes down to an issue of threat modeling. What exactly are you trying to defend against?

For example: are you trying to protect against an attacker who is specifically targeting you and knows your system for generating passwords? Or are you just one of millions of users in some leaked database? Are you defending against GPU based password cracking or just a weak web server? Are you on a host infected with malware[1]?

I think you should assume the attacker knows your exact method of generating passwords and is just targeting you.[2] The xkcd comic assumes in both examples that all the details of the generation are known.

The Math

The mathematics in the xkcd comic is correct, and it's not going to change.

For passwords I need to type and remember I use a python script that generates xkcd style passwords that are truly random. I have a dictionary of 2^11 (2048) common, easy to spell, English words. I could give the full source code and a copy of my list of words to an attacker, there are still going to be 2^44 possible passwords.

As the comic says:

1000 Guesses / Sec Plausible attack on a weak remote web service. Yes, cracking a stolen hash is faster, but it's not what the average user should worry about.

That strikes a nice balance between easy to remember and difficult to crack.

What if we tried more power?

Sure 2^44 is ok, but GPU cracking is fast, and it's only going to get faster. Hashcat could crack a weak hash[3] of that size in a number of days, not years. Also, I have hundreds of passwords to remember. Even xkcd style it gets hard after a few.

This is where password managers come in, I like KeePass but there are many others that are basically the same. Then you can generate just one longer xkcd pass-phrase that you can memorize (say 10 words). Then you create a unique 128-bit truly random password for each account (hex or base 64 are good). 128-bits is going to be strong enough for a long time. If you want to be paranoid go larger, it's no extra work to generate 256-bit of hex passwords.


[1] This is where the memory thing comes in, if you're on a compromised host you have lost. It doesn't matter if you type it or use a program like KeePass to copy and paste it if an attacker can key-log / read memory.

[2] Rather than the weaker (but more likely) assumption that the attacker has just torrented a dictionary called "Top Passw0rdz 4realz 111!".

[3] Sure we should all be using PBKDF2, etc... but lots of sites are still on SHA1. (and they are the good ones)

2 of 10
167

Schneier writes this:

This is why the oft-cited XKCD scheme for generating passwords -- string together individual words like "correcthorsebatterystaple" -- is no longer good advice. The password crackers are on to this trick.

but the key to understanding what he is really after is a little further in his essay:

There's still one scheme that works. Back in 2008, I described the "Schneier scheme"

so that's it. Ole' Bruce wants to assert that his scheme is the One and Only, the best, the winner, the ultimate scheme. Therefore, he needs to say disparaging things about the "competitors", regardless of whether such assertions are scientifically sound or not.

In this case, it has always been assumed that the password generation method is known to the attacker. That's the whole point of entropy computations; see the analysis. That attackers are "on to this trick" changes nothing at all (when an attacker knows the password generation method, the entropy computation describes exactly the password strength; when the attacker is incompetent and does not know the password generation method, the password strength is only higher, by an amount which is nigh impossible to quantify).

The quip about "passwords in memory" is just more incoherent ramblings. Passwords necessarily go to RAM at some point, whether you type them or copy-paste them from a password safe, or anything similar.

My guess is that Bruce was drunk.

Update Schneier was specifically asked to comment about his passphrase condemnation in a Reddit AMA (via archive.org, original link) that took place August 2, 2016. He continued to advocate for his password creation system as a superior alternative to random word passphrases. Schneier did say his scheme "gives you more entropy per memorizable character than other methods" which is true when compared to characters making up words. But this is also irrelevant when a system relies on memorizing words rather than characters, and you're allowed to combine enough words to generate adequate 'entropy' for your passphrase as a whole.

Top answer
1 of 2
23

I don't get nearly the amount of entropy stated in the comic.

Interestingly enough the reasoning for the entropy rating are actually justified in the comic by the little boxes which each represent 1 bit of uncertainty.

This means for Tr0ub4dor&3

  • It's estimatated that the word itself "Troubador" comes up in dictionaries which contain about 216 words
  • It adds one bit for each of o,a,o of the word to encode whether the letter was replaced or not
  • It adds one bit to decide whether the word was capitalized or not
  • It adds one bit for the ordering of the trailing numeral and special character
  • It adds 3 bits for the unknown numeral, approximating 10 with 23 instead of 24 which is more accurate
  • It adds 4 bits for the unknown punctuation, ie which of the approximately 16 standard ones it is

This sums up to 16+3+1+1+3+4=28

For correct horse battery staple the reasoning is that each of the four words is drawn from a dictionary of size 211 which means 4×11=44 bits of entropy.

In both cases it can be assumed that the attacker knows the possible choices influencing the entropy estimation and that it's actually a uniformly random decision which word / pick is done.


If you want an even more thorough explanation of this comic, I can only recommend you read the bear's answer on this over on InfoSec.SE.

2 of 2
2

One official way to estimate the strength of a user selected password such as "Tr0ub4dor&3" is to look at NIST recommendations. Granted that this is now deprecated, but the relevant publication was NIST Special Publication 800-63 Version 1.0.2, Electronic Authentication Guideline.

Table A.1 (reproduced below in case of link rot):-

The reasoning behind this table is within the document at § A.2.1 Guessing Entropy Estimate. NIST therefor estimates that the entropy is 33 bits if we interpolate for 11 characters and use dictionary and composition rules.

The difficulty of assessing the entropy of short sequences, particularly human produced ones is the take away from this question. The two current answers diverge in strength by a factor of 32. If we compare NIST's estimate to Blafasel's original query on 50 bits, the entropy diverges 131,072 times. NIST says of the above, "Readers are cautioned against interpreting the following rules as anything more than a very rough rule of thumb method". True.

Another take away is that very few sites will allow the stronger and easier to remember technique of choice from a word list, such as "correcthorsebatterystaple". The on-line version of the UK government doesn't, no bank I'm aware of does, and stackexchange.com doesn't.

🌐
Reddit
reddit.com › r/askscience › is xkcd right about password strength?
r/askscience on Reddit: IS XKCD right about password strength?
March 19, 2012 -

I am sure many of you have seen this comic, and it seems to be a very convincing argument. Anyone have any counter arguments?

Top answer
1 of 24
804
First a little bit of information theory. The word bit in this context means something slightly different, although related, than what people usually think. Now it's a unit of information. Suppose there's a normal coin and someone flips it but doesn't show you the result. Now the person who flipped the coin can give you information about the result. Assuming it's a fair coin (50/50 chance for each side) they need to give you exactly one bit of information to convey the result. Then consider the case of using a trick coin with heads on both sides. How much information does the person need to give you for you to know whether the coin ended up heads or tails? That will depend on whether you know beforehand that a trick coin was used. If you did then you will know it ends up heads always and you don't need any information to know the result. But if you don't know that a trick coin is used then you still need the same amount of information. For a fair six-sided die, you need log(6) bits (base 2 logarithm), that is about 2.6 bits. Fractional bits are no more a problem here than having something weigh 2.6 kilos. If it's a loaded die with a greater chance ending up 6, then this will change. So what does all this have to do with the comic? How many bits of information the passwords contain depend entirely on what you expect of the passwords. The first panel explains the assumptions for the common password format. A somewhat uncommon word (16 bits, or a 65-thousand-word vocabulary), one bit for capitalisation (of the first letter only), some common substitutions (would depend on the word but estimated to be 3 bits in the comic, seems reasonable), a punctuation character (four bits) and a number (3 bits) always at the end, but they can change order (one more bit). This gives the 28 bits for that format. If you know that the password you're trying to crack follows this format, then the calculations make sense. There's also that side note that you can add a few more bits to cover other common formats. The other way to make a password, four common words, then gives 11 bits for each word, so a vocabulary of about 2000 words. And since there's four of them you get a total of 44 bits, much more than the other way to make your password. Again, if you know the password is this format, then I don't see anything wrong with the calculations. Note that this means that the attacker already knows that the password consists of four common words and would use a dictionary to crack it. The 44 bits are calculated with this in mind. If the cracker were to assume that all possible letter combinations, mostly non-sense words that is, are possible and equally likely, then the information content would be even higher. How sensible is it then for a cracker to assume some specific format for the password? I would say that it is very sensible, at least to start the cracking with the common formats. If you get a hold of a whole database of passwords and start brute forcing them, then you might not care if you don't crack all of them, your goal is maybe to just crack some of them. It's pretty safe to assume that the majority of the passwords will follow the few most common password formats so why not try those first. And after that you may just give up on the rest of them or move on to more exotic password formats if you really want to.
2 of 24
140
I realize you've asked science here, but I just thought I'd point out that if you'd asked netsec the answer would be a resounding yes. Brute force password attacks are messy, lengthy, and almost never worth it. Steps can be taken server-side to prevent them that don't require such inconvenience to the user. The more complex the password, the more likely a user is to write it on a sticky-note and stick it to the monitor, or keep it in a text file for copy/pasting whenever it is needed. Those are far more likely to be a security risk than "weak" passwords.
🌐
Palant
palant.info › 2023 › 01 › 30 › password-strength-explained
Password strength explained | Almost Secure
However, this study is methodically flawed and wildly overestimates password strength. In 2007 neither XKCD comic 936 nor zxcvbn existed. So the researchers calculate password strength by looking at the character classes used.
Find elsewhere
🌐
Lobsters
lobste.rs › s › x6bt1h › xkcd_s_correcthorsebatterystaple
XKCD's "correcthorsebatterystaple" password can be cracked in less than 0.01 seconds | Lobsters
While researching this subject, ... correcthorsebatterystaple, an attacker would need to spend centuries cracking it. That is completely incorrect, and I'm disturbed how much confusion the XKCD comic has generated, that it has resulted ...
🌐
explain xkcd
explainxkcd.com › wiki › index.php › Talk:936:_Password_Strength
Talk:936: Password Strength - explain xkcd
In her TED Lorrie Faith Cranor says ”sorry all you xkcd fans” which could be interpreted as judgement of #936, but there is no basis in the above article for that. It does however seem plausible that the report could be reworked to address #936. --Gnirre (talk) 10:42, 14 October 2014 (UTC) Password-changing frequency isn't about making passwords more secure, but instead it's about mitigating the damage of a successfully cracked password.
🌐
Hacker News
news.ycombinator.com › item
Anyone using this comic to imply that a passphrase is more secure than a short r... | Hacker News
March 3, 2013 - The example passphrase does have the equivalent of 44 bits of entropy: · log_2 (2048^4) = 4 * 11 = 44
🌐
Quora
quora.com › Why-is-correcthorsebatterystaple-considered-a-strong-password
Why is 'correcthorsebatterystaple' considered a strong password? - Quora
Answer (1 of 5): It's not. Never ever use "correcthorsebatterystaple" as your password. It is very widely known and password crackers are sure to include it among their first attempts because it was used as an example to illustrate a good way ...
🌐
Faraday
faradaysec.com › home › password strength
Password strength - Faraday
April 18, 2023 - Even if security teams and experts don’t usually fall for such attacks, there’s still a side of security that’s not always acknowledged. To illustrate it, passwords come in handy again (and as usual, there’s a relevant XKCD comic to illustrate the point):
🌐
Weberblog
weberblog.net › password-strengthentropy-characters-vs-words
Password Strength/Entropy: Characters vs. Words | Weberblog.net
But a passphrase with at least ... not simply cut it after the maximum input size ;)) Coming back to the xkcd comic: Yes, it is more secure to use a passphrase with 4 words than a password....
🌐
Correcthorse
correcthorse.pw
Correct Horse Battery Staple: xkcd-Style Password Generator
If you are paranoid and want to feel better, use five. I made this website because I wanted a good password generator. I know the xkcd-style password scheme is fairly secure and easy to remember, especially if you increase the length, and it's simple enough to understand and verify.
Top answer
1 of 15
1619

I think the most important part of this comic, even if it were to get the math wrong (which it didn't), is visually emphasizing that there are two equally important aspects to selecting a strong password (or actually, a password policy, in general):

  • Difficulty to guess
  • Difficulty to remember

Or, in other words:

  • The computer aspect
  • The human aspect

All too often, when discussing complex passwords, strong policies, expiration, etc (and, to generalize - all security), we tend to focus overly much on the computer aspects, and skip over the human aspects.

Especially when it comes to passwords, (and double especially for average users), the human aspect should often be the overriding concern.
For example, how often does strict password complexity policy enforced by IT (such as the one shown in the XKCD), result in the user writing down his password, and taping it to his screen? That is a direct result of focusing too much on the computer aspect, at the expense of the human aspect.

And I think that is the core message from the sage of XKCD - yes, Easy to Guess is bad, but Hard to Remember is equally so.
And that principle is a correct one. We should remember this more often, AKA AviD's Rule of Usability:

Security at the expense of usability comes at the expense of security.

2 of 15
566

Here is a thorough explanation of the mathematics in this comic:

The little boxes in the comic represent entropy in a logarithmic scale, i.e. "bits". Each box means one extra bit of entropy. Entropy is a measure of the average cost of hitting the right password in a brute force attack. We assume that the attacker knows the exact password generation method, including probability distributions for random choices in the method. An entropy of n bits means that, on average, the attacker will try 2n-1 passwords before finding the right one. When the random choices are equiprobable, you have n bits of entropy when there are 2n possible passwords, which means that the attacker will, on average, try half of them. The definition with the average cost is more generic, in that it captures the cases where random choices taken during the password generation process (the one which usually occurs in the head of the human user) are not uniform. We'll see an example below.

The point of using "bits" is that they add up. If you have two password halves that you generate independently of each other, one with 10 bits of entropy and the other with 12 bits, then the total entropy is 22 bits. If we were to use a non-logarithmic scale, we would have to multiply: 210 uniform choices for the first half and 212 uniform choices for the other half make up for 210·212 = 222 uniform choices. Additions are easier to convey graphically with little boxes, hence our using bits.

That being said, let's see the two methods described in the comic. We'll begin with the second one, which is easier to analyze.

The "correct horse" method

The password generation process for this method is: take a given (public) list of 2048 words (supposedly common words, easy to remember). Choose four random words in this list, uniformly and independently of each other: select one word at random, then select again a word at random (which could be the same as the first word), and so on for a third and then a fourth words. Concatenate all four words together, and voila! you have your password.

Each random word selection is worth 11 bits, because 211 = 2048, and, crucially, each word is selected uniformly (all 2048 words have the same probability of 1/2048 of being selected) and independently of the other words (you don't choose a word so that it matches or non-matches the previous words, and, in particular, you do not reject a word if it happens to be the same choice as a previous word). Since humans are not good at all at doing random choices in their head, we have to assume that the random word selection is done with a physical device (dice, coin flips, computers...).

The total entropy is then 44 bits, matching the 44 boxes in the comic.

The "troubador" method

For this one, the rules are more complex:

  1. Select a random word in a given big list of meaningful words.
  2. Decide randomly whether to capitalize the first letter, or not.
  3. For the letters which are eligible to "traditional substitutions", apply or not apply the substitution (decide randomly for each letter). These traditional substitutions can be, for instance: "o" -> "0", "a" -> "4", "i" -> "!", "e" -> "3", "l" -> "1" (the rules give a publicly known exhaustive list).
  4. Append a punctuation sign and a digit.

The random word is rated to 16 bits by the comic, meaning uniform selection in a list of 65536 words (or non-uniform in a longer list). There are more words than that in English, apparently about 228000, but some of them are very long or very short, others are so uncommon that people would not remember them at all. "16 bits" seem to be a plausible count.

Uppercasing or not uppercasing the first letter is, nominally, 1 bit of entropy (two choices). If the user makes that choice in his head, then this will be a balance between user's feeling of safety ("uppercase is obviously more secure !") and user's laziness ("lowercase is easier to type"). There again, "1 bit" is plausible.

"Traditional substitutions" are more complex because the number of eligible letters depends on the base word; here, three letters, hence 3 bits of entropy. Other words could have other counts, but it seems plausible that, on average, we'll find about 3 eligible letters. This depends on the list of "traditional substitutions", which are assumed to be a given convention.

For the extra punctuation sign and digit, the comic gives 1 bit for the choice of which comes first (the digit or the punctuation sign), then 4 bits for the sign and 3 bits for the digit. The count for digits deserves an explanation: this is because humans, when asked to choose a random digit, are not at all uniform; the digit "1" will have about 5 to 10 times more chances of being selected than "0". Among psychological factors, "0" has a bad connotation (void, dark, death), while "1" is viewed positively (winner, champion, top). In south China, "8" is very popular because the word for "eight" is pronounced the same way as the word for "luck"; and, similarly, "4" is shunned because of homophony with the word for "death". The attacker will first try passwords where the digit is a "1", allowing him to benefit from the non-uniformity of the user choices.

If the choice of digit is not made by a human brain but by an impartial device, then we get 3.32 bits of entropy, not 3 bits. But that's close enough for illustration purposes (I quite understand that Randall Munroe did not want to draw partial boxes).

Four bits for punctuation are a bit understated; there are 32 punctuation signs in ASCII, all relatively easy to type on a common keyboard. This would mean 5 bits, not 4. There again, if the sign is chosen by a human, then some signs will be more common than others, because humans rarely think of '#' or '|' as "punctuation".

The grand total of 28 bits is then about right, although it depends on the precise details of some random selections, and the list of "traditional substitutions" (which impacts the average number of eligible letters). With a computer-generated password, we may hope for about 30 bits. That's still low with regards to the 44 bits of the "correct horse" method.

Applicability

The paragraphs above show that the maths in the comic are correct (at least with the precision that can be expected in these conditions -- that's a webcomic, not a research article). It still requires the following conditions:

  • The "password generation method" is known by the attacker. This is the part which @Jeff does not believe. But it makes sense. In big organizations, security officers publish such guidelines for password generation. Even when they don't, people have Google and colleagues, and will tend to use one of about a dozen or so sets of rules. The comic includes provisions for that: "You can add a few more bits to account for the fact that this is only one of a few common formats".

    Bottom-line: even if you keep your method "secret", it won't be that secret because you will more or less consciously follow a "classic" method, and there are not that many of those.

  • Random choices are random and uniform. This is hard to achieve with human users. You must convince them to use a device for good randomness (a coin, not a brain), and to accept the result. This is the gist of my original answer (reproduced below). If the users alter the choices, if only by generating another password if the one they got "does not please them", then they depart from random uniformity, and the entropy can only be lowered (maximum entropy is achieved with uniform randomness; you cannot get better, but you can get much worse).

The right answer is of course that of @AviD. The maths in the comic are correct, but the important point is that good passwords must be both hard to guess and easy to remember. The main message of the comic is to show that common "password generation rules" fail at both points: they make hard to remember passwords, which are nonetheless not that hard to guess.

It also illustrates the failure of human minds at evaluating security. "Tr0ub4dor&3" looks more randomish than "correcthorsebatterystaple"; and the same minds will give good points to the latter only because of the wrong reason, i.e. the widespread (but misguided) belief that password length makes strength. It does not. A password is not strong because it is long; it is strong because it includes a lot of randomness (all the entropy bits we have been discussing all along). Extra length just allows for more strength, by giving more room for randomness; in particular, by allowing "gentle" randomness that is easy to remember, like the electric horse thing. On the other hand, a very short password is necessarily weak, because there is only so much entropy you can fit in 5 characters.

Note that "hard to guess" and "easy to remember" do not cover all that is to say about password generation; there is also "easy to use", which usually means "easy to type". Long passwords are a problem on smartphones, but passwords with digits and punctuation signs and mixed casing are arguably even worse.


Original answer:

The comic assumes that the selection of a random "common" word yields an entropy of about 11 bits -- which means that there are about 2000 common words. This is a plausible count. The trick, of course, is to have a really random selection. For instance, the following activities:

  • select four words randomly, then remember them in the order which makes most sense;
  • if the four words look too hard to remember, scrap them and select four others;
  • replace one of the words with the name of a footballer (the attacker will never guess that !);

... all reduce the entropy. It is not easy to get your users to actually use true randomness and accept the result.

The same users will probably complain about the hassle of typing a long password (if the typing involves a smartphone, I must say that I quite understand them). An unhappy user is never a good thing, because he will begin to look for countermeasures which will make his life easier, such as keeping the password in a file and "typing" it with a copy&paste. Users can often be surprisingly creative that way. Therefore long passwords have a tendency to backfire, security-wise.

🌐
Fractional CISO
fractionalciso.com › home › password advice – xkcd
Password Advice - xkcd | Virtual CISO
February 6, 2019 - It is true that if an attacker did not know the password scheme and was trying a brute force attack that this password style would be relatively effective but let’s break down why such a scheme might not work.
🌐
Reddit
reddit.com › r/sysadmin › [security] opinion on the xkcd password strength comic?
r/sysadmin on Reddit: [Security] Opinion on the XKCD Password Strength comic?
March 16, 2021 -

So I had seen the XKCD Password Strength comic a long while back, and it made sense to me, but then I was wondering about dictionary attacks and whatnot, so I wanted to see where everyone stands on this idea.

This site made a small random password generator with a relatively small pool of words, but it sparked an interesting discussion in the comments below about how secure the concept really is.

Ideally, I would still use my password manager and use very long generated gibberish strings, but I figured a random word based password would be good in situations where you couldn't interface with a browser/pw manager, or maybe needed a bit of convenience. Mainly thinking of a computer login screen, but I'm sure there are plenty of other similar situations.

So my computer login for work, uses a relatively short pile of gibberish that I had committed to memory. (It's gibberish that made sense to me, so it wasn't like I spent time trying to memorize it). If it were random words, that would be considerably longer, but discrete words are more... guessable?

Love to hear everyone's general thoughts, as well as anyone who has considerable background in security.

...

FIGHT!

🌐
Fractional CISO
fractionalciso.com › home › correct horse battery staple review – password advice
Correct Horse Battery Staple Review - Password Advice
May 18, 2023 - The “Correct Horse Battery Staple” piece at xkcd is still so popular! I guess if you want something to live on then make a comic about it… · In the comic, you have an example of the type of password that we’ve been taught to create by IT systems over the past couple of decades (Tr0ub4dor&3).