How effective is changing passwords?
The short answer is: not much. It is much more important to choose passwords from a large set of possible words and to detect and limit cracking attempts.
Now for the longer answer.
[for software used in this note see Software]
Effect of Changing Passwords
If a password is unchanged, then a cracker may continuously increase the probability of correctly guessing because he is working a drawing without replacement problem. The naive conclusion is that changing passwords helps.
So, how much does changing help? We can answer that by looking at the two boundary conditions: never changing and changing after each crack attempt.
If we don't change and the attacker knows that we don't change, then this becomes a 'drawing without replacement' problem. If we assume that there are N possible passwords, then Pr{crack succeeds after k tries} = k/N.
Proof:
It's easier to write the probability of cracking failure than success:
Let Pr{crack fails after k tries}
P1 = (N-1)/N – because there are N-1 chances to be wrong
assume Pk = (N-k)/N
then Pk+1 = (N-k-1)/(N-k) ∙ Pk – because there are only N-k possible choices and N-k-1 chances to be wrong.
So Pk+1 = (N-k-1)/(N-k) ∙ (N-k)/N – by assumption
= (N-(k+1))/N
= 1 – (k+1)/N
So, the result is proven by induction.
On the other hand, if we reset the password after every time a crack attempt is made, this becomes a 'drawings with replacement' problem, so that:
Pr{crack fails after k tries} = Pk = [(N-1)/N]k because there are always (N-1) chances to be wrong.
for k << N, k/N is small, so we can estimate Pk by expanding the polynomial
Pk = [1 – 1/N]k
= 1 – k/(2N) + O((k/N)2) for k << N
Finally, suppose we change the password every n crack attempts. That restarts the clock, so to speak, so we now have k/n sequences of 'drawing without replacement'. This gives
Pk = ((N-n)/N)k/n ∙ ((N-(k mod n))/N)
So, the probability of not being cracked after k attempts is:
(1 – k/N) – if you don't change the password
(1 – k/(2N)) – if you change it continuously
or someplace between these two values.
The upshot is that changing passwords really doesn't matter as much as having a large N and small k because the Pr{crack succeeding} is roughly between k/N and k/(2N) and changing passwords doesn't effect that.
How to Lower the Probability of being
Cracked
We need to keep (k/N) small. This means we want N large and k small.
N can be made large by having a large possible selection of passwords. This gets us into characterizing passwords which are likely to be chosen.
Unlike N, k is not fixed. It grows as crackers attempt to guess passwords. The rate of growth of k is under the control of the cracker, the system administrator, and software providers. We obviously need schemes to slow down cracking attempts while allowing legitimate users through.
How big is N
In order to estimate N , we need to characterization of passwords how passwords are constructed.
Passwords need to be both easy to remember so they don't need to be written down and hard to guess. As usual, these are opposing goals. An easy to remember password is the letter 'a'. It exceptionally easy to guess. A difficult to guess password would be a random sequence taken from the printable ASCII character set of some relatively long length – which would be impossible to remember.
An upper bound on the probability of guessing a password, given no knowledge of the construction, is 1/An where A is the number of symbols in the password alphabet and n is the number of symbols in the password. This upper bound is grossly high. For example, for passwords taken only from the lower [or upper] case alphabetic symbols, this gives one chance in 265 = 11,883,376.
This upper bound is a grossly inaccurate estimate because in practice, passwords are taken from words in the user's native language and then distorted in some way. Following are the results of the analysis of the distribution of word lengths of English language words in the King James Bible, The Federalist Papers, and A Short Biographical Dictionary of English Literature, by John W. Cousin – possibly more or less representative – is given in the table to the right. This text contained 1,074,634 words, of which only 27,454 were distinct. [Actually, if the number of root words is smaller, because the algorithm used for counting did not distinguish related words]
Table 1
|
Word Length |
Count |
Percent of Total |
|
1 |
0.11 |
24 |
|
2 |
.34 |
74 |
|
3 |
1.80 |
388 |
|
4 |
6.2 |
1332 |
|
5 |
10.62 |
2283 |
|
6 |
14.44 |
3104 |
|
7 |
16.03 |
3447 |
|
8 |
15.21 |
3271 |
|
9 |
13.05 |
2806 |
|
10 |
9.49 |
2041 |
|
11 |
6.02 |
1295 |
|
12 |
3.46 |
744 |
|
> 12 |
approx 3 ¼ percent |
693 |
Notice that the number of words of length 5 is 2,283, not 10,000,000.
Thus, a reasonable estimate for words in common use – taking into consideration jargon, personal names, etc, - should be less than a small multiple of the numbers in this table.
The point is that simple alphabetic passwords will be trivial to guess because N is far too small.
Simple permutations of passwords by random capitalization can significantly increase the number of possible passwords without significantly decreasing their mnemonic properties. For example, if we arbitrarily capitalize two letters in a 5 character password, we can generate 5 x 4 = 20 variants. This will increase our number of possible passwords derived from 5 character words from about around 3,000 to about 60,000. Allowing ourselves to capitalize any or none of the letters expands the multiplier to 25 = 32 and increases our number of passwords to about 90,000.
Another 'trick' is to insert one or two digits and/or substitute digits for letters. [I prefer insertion to replacement because we tend to replace letters with digits which seem similar to the letters – 3 for 'e', 1 for 'i' or 'l',
Inserting digits generates more randomness because the digit can be located at any location in the string. Thus, inserting the digit 0, generates 6 new passwords for each letter combination. Allowing any of the digits to be inserted increases the multiplier to 60, so the combination of inserting a single digit and random capitalization increases our password space from 3,000 to 3,000 x 32 x 60 = 5,760,000.
Allowing two digits to be randomly inserted into a 5 character password yields a multiplier of 6x7x100 = 4,200 [the first digit expands the password to a 6 character password; the second expands it to 7 characters]. This results in 3,000 x 32 x 4,200 = 403,200,000.
So, to generalize, suppose we have an M character string of lower case letters and we allow random capitalization of 0 through M characters and require the insertion of two digits, randomly placed. This gives us a multiplier of 100(M+1)(M+2) for the digit insertion and 2M for the capitalization.
Table 2
|
M |
100 ∙ (M+1) ∙ (M+2) ∙ 2M |
|
4 |
48,000 |
|
5 |
134,400 |
|
6 |
358,400 |
|
7 |
921,600 |
|
8 |
2,304,000 |
|
9 |
5,632,000 |
|
10 |
13,516,800 |
|
11 |
31,948,800 |
So the number of passwords we have will be:
(number of plain-text words) x 100 x (M+1) x (M+2) x 2M
The table only ranges up to M = 11 because it is unlikely that normal humans will put up with passwords containing alphabetic text longer than this.
The next thing to do is to expand the number of words available on which to base passwords on by adding phrases which can be created by concatenating shorter words. For example, referring to Table 1, if we concatenate 2 and 3 letter words, independently chosen at random, we can create 2 ∙176 ∙ 472 = 166,144 base phrases. This will allow us to generate 22,329,753,600 passwords which are (relatively) easy to remember.
So, this gives us a direction to go in order to characterize the maximum expected value for N – the number of possible passwords which are likely to be in use.
Let's say we want passwords no less than 8 characters long and our users will not tolerate passwords longer than 12. We will require at least one capital letter and between 2 and 4 digits inserted, not substituted for letters [letter substitutions seem to be somewhat predictable: 1 for I, 0 for o, 3 for e, etc]. Results for each 2, 3, and 4 digits are summarized in the tables below – both with and without case shifts.
|
Approximate Number of passwords generated by randomly shifting
alphabetic characters and inserting digits – Exactly 2 digits inserted |
||||||
|
|
Exactly 2 digits inserted |
Exactly 3 digits inserted |
Exactly 4 digits inserted |
|||
|
Overall Length |
With Case Shifts |
Without Case Shifts |
With Case Shifts |
Without Case Shifts |
With Case Shifts |
Without Case Shifts |
|
8 |
8×
1012 |
1× 1011 |
3
× 1013 |
9× 1011 |
2
× 1014 |
1× 1013 |
|
9 |
6 × 1013 |
5× 1011 |
8 × 1014 |
1 × 1013 |
4 × 1015 |
1× 1014 |
|
10 |
7 × 1014 |
3 × 1012 |
7 × 1015 |
5 × 1013 |
1 × 1017 |
1 × 1015 |
|
11 |
3 × 1015 |
6 × 1012 |
9 × 1016 |
3 × 1014 |
9 × 1017 |
7 × 1015 |
|
12 |
2 × 1016 |
2× 1013 |
8 × 1017 |
7 × 1015 |
1 × 1019 |
5× 1016 |
Table 3
The approximate number of passwords is computed by taking all combinations of words from our 'representative text' which can be used to generate passwords and then accounting for digit insertion.
As expected, each additional digit increases the number of passwords possible by an order of magnitude. Even so, the number of expected passwords using this technique is depressingly low for reasonably expected lengths – in the 8 character range.
All of these results were computing using software available from this link: Software.
How can we control growth of k?
To add to the depression, lets assume that a cracker can make one login attempt per second – reasonable if login processing is not artificially inflated. Inasmuch as this will be automated, we can presume the attempts will be continuous – if not on one target system, then on a collection of them. So any given exposed system can expect 86,400 crack attempts per day – or 30,758,400 ≈ 3 × 108 per year. So if I use an 8 character password, don't change ever change it, and the cracker continuously tries my account, then the probability I will get cracked in a year is about 3/8 × 10-4 – which is still to high to make me comfortable.
If, on the other hand, I do something as simple as noticing when I get a significant number of failed login attempts from a particular IP address and firewall that address off, I can cut my crack attempts down considerably. Software contains a crude python hack which monitors /var/log/secure and shuts down ssh access from IP addresses which generate 10 or more failed login attempts within any one hour period.
Summary
So, in summary, we can do things at a system level to make cracking more difficult than it currently is. As a first step, two reasonable things to do are:
· enforce quality passwords – overall length of 10 plus characters with 2 or more inserted digits
· monitor login attempts and firewall off hostile IP addresses.