'"PGP," warns Dorothy Denning, a Georgetown University professor
who has worked closely with the National Security Agency, "could
potentially become a widespread problem.' -- (E. Dexheimer)
Comments to: Grady Ward, grady@netcom.com
Contributors:
John Kelsey, c445585@mizzou1.missouri.edu (Appendix A.)
RSA Data Security (Appendix C. The MD5 Algorithm)
Jim Gillogly (Appendix D. The Secure Hash Algorithm)
FAQ: How do I choose a good password or phrase?
ANS: Shocking nonsense makes the most sense
With the intrinsic strength of some of the modern
encryption, authentication, and message digest algorithms such as
RSA, MD5, SHS and IDEA the user password or phrase is becoming
more and more the focus of vulnerability.
For example, Deputy Ponder with the Los Angeles County
Sheriff's Department admitted in early 1993 that both they and the
FBI despaired of breaking the PGP 1.0 system except through a
successful dictionary attack (trying many possible passwords or
phrases from lists of probable choices and their variations)
rather than "breaking" the underlying cryptographic algorithm
mathematically.
The fundamental reason why attacking or trying to guess the
user's password or phrase will increasingly be the focus of
cryptanalysis is that the user's choice of password may represent
a much simpler cryptographic key than optimal for the encryption
algorithm being used. This weakness of the user's password choice
provides the potential cryptanalytic wedge.
For example, suppose a user chooses the password 'david.' On
the surface the entropy of this key (or the number of different
equiprobable key states) appears to be five characters chosen from
a set of twenty-six with replacements: 26^5 or 1.188 x 10^7. But
since the user is apparently biased toward common given names,
which a majority appear in lists numbering only 6,000-7,000
entries, the true entropy is undoubtedly much closer to 6.5 x
10^3, or about four orders of magnitude smaller than the raw
length might suggest. (In fact this password probably possesses a
much smaller entropy than even this for the very common name
"david" would be one of the first names to be checked by an
optimized dictionary attack program.)
In other words the "entropy" of a keyspace is not a fixed
physical quantity: the cryptanalyst can exploit whole cultural
biases and contexts, not just byte frequencies, digraphs, or even
whole-word correlations to reduce the key space he or she is
trying to explore.
To thwart this avenue of attack we would like to discover a
method of selecting passwords or phrases that have at least as
many bits of entropy (or "hard-to-guessness") as the entropy of
the cryptographic key of the underlying algorithm being used.
To compare, DES (Data Encryption Standard) is believed to
have about 54-55 bits (~4 x 10 ^16) of entropy while the IDEA
algorithm is believed to have about 128 bits (~3.5 x 10^38) of
entropy. The closer the entropy of the user's password or phrase
is to the intrinsic entropy of the cryptographic key of the
underlying algorithm being used, the more likely an attacker would
need to search a substantially larger portion of the algorithm's
key space in order to rediscover the key.
Unfortunately many documents suggest choosing passwords or
phrases that are distinctly inferior to the latest method. For
example, one white paper widely archived on the internet suggests
selecting an original password by constructing an acronym from a
popular song lyric or from a line of script from, for example, the
SF movie "Star Wars". Both of these ideas turn out to be weak
because both the entire script to Stars Wars and entire sets of
song lyrics to thousands of popular songs are available on-line to
everyone and, in some cases, are already embedded into "crack"
dictionary attack programs (See ftp.uwp.edu).
However, the conflict between choosing an easy-to-remember
key and choosing a key with a high level of entropy is not a
hopeless task if we exploit mnemonic devices that have been used
for a long time outside the field of cryptography. With the goal
of making up a passphrase not included in any existing corpus yet
very easy to remember, an effective technique is one known as
"shocking nonsense."
"Shocking nonsense" means to make up a short phrase or
sentence that is both nonsensical and shocking in the culture of
the user, that is, it contains grossly obscene, racist, impossible
or other extreme juxtaposition of ideas. This technique is
permissable because the passphrase, by its nature, is never revealed to
anyone with sensibilities to be offended.
Shocking nonsense is unlikely to be duplicated anywhere
because it does not describe a matter-of-fact that could be
accidentally rediscovered by anyone else and the emotional
evocation makes it difficult for the creator to forget. A mild
example of such shocking nonsense might be: "mollusks peck my
galloping genitals ." The reader can undoubtedly make up many far
more shocking or entertaining examples for himself or herself.
Even relatively short phrases offer acceptable entropy
because the far larger "alphabet" pool of word symbols that may be
chosen than the 26 characters that form the Roman alphabet. Even
choosing from a vocabulary of a few thousand words a five word
phrase might have on the order of 58 to 60 bits of entropy -- more
than what is needed for the DES algorithm, for example.
When you are permitted to use passphrases of arbitrary
length (in PGP for example) it is not necessary to further perturb
your 'shocking nonsense' passphrase to include numbers or special
symbols because the pool of word choices is already very high. Not
needing those special symbols or numbers (that are not
intrinsically meaningful) makes the shocking nonsense passphrase
that much easier to remember.
If you are forced to use, say, a Unix password utility that
permits only passwords of restricted length, one good strategy is
to process a your secret passphrase using MD5 or SHA, then
UUENCODE the result and select your shorter key from the output.
See Appendix C and D for actual MD5 and SHA source implmentations.
Appendix A. For software developers
For software developers designing "front-ends" or user
interfaces to conventional short-password applications, very good
results will come from permitting the user arbitrary length
passphrases that are then "crunched" or processed using a strong
digest algorithm such as the 160-bit SHS (Secure Hash Standard) or
the 128-bit MD5 (Message Digest rev.5).[See following Appendices]
The interface program then chooses the appropriate number of bits
from the digest and supplies them to the engine enforcing a short
password. This 'key crunching' technique will assure the developer
that even the short password key space will have a far greater
opportunity of being fully exploited by the user.
John Kelsey writes:
"I think it's a really good idea to use a randomly-generated
salt to generate a key from a password, and that this salt should
be as large as possible. Basically, this is to keep an attacker
from spending lots of computer power *once* to generate a
dictionary of likely keys. If users use good techniques to choose
passwords, this won't matter much, but if they don't, this may
save them from having their encrypted files or transmissions
routinely read. The simplest scheme I can see for this is simply
to prepend a 128-bit salt (generated as strongly as possible) to
each encrypted file. Generate the key from the password by pre-
filling a buffer with the 128-bit salt, then XORing in the keyed-
in password, or by appending the key to the keyed-in password.
Then, run SHA or MD5 or whatever to get the key.
A secondary point: Adding a random salt ensures that people
who use the same password/passphrase for lots of
files/transmissions don't get the same key every time. Since most
successful attacks against modern encryption schemes use *lots* of
ciphertext from the same key, this might add some practical
security, at relatively low cost."
--John Kelsey, c445585@mizzou1.missouri.edu
Appendix B. A tool to experimentally investigate entropy
A practical Unix tool for investigating the entropy of
typical user keys can be found in Wu and Manber's 'agrep'
(approximate grep) similarity pattern matching tool available in C
source from cs.arizona.edu [192.12.69.5]. This tool can determine
the "edit distance," that is, the number of insertions,
substitutions, or deletions that would be required of an arbitrary
pattern in order for it to match any of a large corpus of words or
phrases, say the usr/dict word list, or over the set of Star Trek
trivia archives. The user can then adjust the pattern to give an
arbitrary high threshold difference between it and common words
and phrases in the corpus to make crack programs that
systematically vary known strings less likely to succeed. It is
often surprising to discover that a substring pattern like
"hxirtes" is only of edit distance two from as many as forty
separate words ranging from "bushfires" to "whitest." Certainly no
password or phrase ought to be chosen as a working password or
phrase that is within two or fewer edit distance from a known
string or substring in any on-line collection.
Appendix C.
An implementation in C of the Message Digest Rev.5 Algorithm
/*
md5.h -- header file for implementation of MD5
RSA Data Security, Inc. MD5 Message-Digest Algorithm
Created: 2/17/90 RLR
Revised: 12/27/90 SRD,AJ,BSK,JT Reference C version
Revised (for MD5): RLR 4/27/91
-- G modified to have y&~z instead of y&z --
FF, GG, HH modified to add in last register done --
Access pattern: round 2 works mod 5, round 3 works mod 3 --
distinct additive constant for each step --
round 4 added, working mod 7
*/
/*
Copyright (C) 1990, RSA Data Security, Inc. All rights reserved.
License to copy and use this software is granted provided that it
is identified as the "RSA Data Security, Inc. MD5 Message-Digest
Algorithm" in all material mentioning or referencing this software
or this function.
License is also granted to make and use derivative works
provided that such works are identified as "derived from the RSA
Data Security, Inc. MD5 Message-Digest Algorithm" in all
material mentioning or referencing the derived work.
RSA Data Security, Inc. makes no representations concerning
either the merchantability of this software or the suitability of
this software for any particular purpose. It is provided "as is"
without express or implied warranty of any kind.
These notices must be retained in any copies of any part of this
documentation and/or software.
*/
typedef unsigned int UINT4;
/* Data structure for MD5 (Message-Digest) computation */
typedef struct {
UINT4 i[2]; /* number of _bits_ handled mod 2^64 */
UINT4 buf[4]; /* scratch buffer */
unsigned char in[64]; /* input buffer */
unsigned char digest[16]; /* actual digest after MD5Final call
*/
} MD5_CTX;
RSA Data Security, Inc. MD5 Message-Digest Algorithm
Created: 2/17/90 RLR
Revised: 1/91 SRD,AJ,BSK,JT Reference C Version
*/
/*
Message-digest routines:
To form the message digest for a message M
(1) Initialize a context buffer mdContext using MD5Init
(2) Call MD5Update on mdContext and M
(3) Call MD5Final on mdContext
The message digest is now in mdContext->digest[0...15]
/* The routine MD5Init initializes the message-digest context
mdContext. All fields are set to zero.
*/
void MD5Init ( MD5_CTX *mdContext)
{
mdContext->i[0] = mdContext->i[1] = (UINT4)0;
/*
The routine MD5Update updates the message-digest context to
account for the presence of each of the characters
inBuf[0..inLen-1]
in the message whose digest is being computed.
*/
void MD5Update (register MD5_CTX *mdContext, unsigned char *inBuf,
unsigned int inLen)
{
register int i, ii;
int mdi;
UINT4 in[16];
/* compute number of bytes mod 64 */
mdi = (int)((mdContext->i[0] >> 3) & 0x3F);
/* update number of bits */
if ((mdContext->i[0] + ((UINT4)inLen << 3)) < mdContext->i[0])
mdContext->i[1]++;
mdContext->i[0] += ((UINT4)inLen << 3);
mdContext->i[1] += ((UINT4)inLen >> 29);
while (inLen--) {
/* add new character to buffer, increment mdi */
mdContext->in[mdi++] = *inBuf++;
/* transform if necessary */
if (mdi == 0x40) {
for (i = 0, ii = 0; i < 16; i++, ii += 4)
in[i] = (((UINT4)mdContext->in[ii+3]) << 24) |
(((UINT4)mdContext->in[ii+2]) << 16) |
(((UINT4)mdContext->in[ii+1]) << 8) |
((UINT4)mdContext->in[ii]);
Transform (mdContext->buf, in);
mdi = 0;
}
}
}
/*
The routine MD5Final terminates the message-digest computation and
ends with the desired message digest in mdContext->digest[0...15].
*/
void MD5Final (MD5_CTX *mdContext)
{
UINT4 in[16];
int mdi;
unsigned int i, ii;
unsigned int padLen;
/* save number of bits */
in[14] = mdContext->i[0];
in[15] = mdContext->i[1];
/* compute number of bytes mod 64 */
mdi = (int)((mdContext->i[0] >> 3) & 0x3F);
/* pad out to 56 mod 64 */
padLen = (mdi < 56) ? (56 - mdi) : (120 - mdi);
MD5Update (mdContext, PADDING, padLen);
/* append length in bits and transform */
for (i = 0, ii = 0; i < 14; i++, ii += 4)
in[i] = (((UINT4)mdContext->in[ii+3]) << 24) |
(((UINT4)mdContext->in[ii+2]) << 16) |
(((UINT4)mdContext->in[ii+1]) << 8) |
((UINT4)mdContext->in[ii]);
Transform (mdContext->buf, in);
/* store buffer in digest */
for (i = 0, ii = 0; i < 4; i++, ii += 4) {
mdContext->digest[ii] = (unsigned char)(mdContext->buf[i] &
0xFF);
mdContext->digest[ii+1] =
(unsigned char)((mdContext->buf[i] >> 8) & 0xFF);
mdContext->digest[ii+2] =
(unsigned char)((mdContext->buf[i] >> 16) & 0xFF);
mdContext->digest[ii+3] =
(unsigned char)((mdContext->buf[i] >> 24) & 0xFF);
}
}
/*
Basic MD5 step. Transforms buf based on in. Note that if the
Mysterious Constants are arranged backwards in little-endian order
and decrypted with the DES they produce OCCULT MESSAGES!
*/
void Transform(register UINT4 *buf,register UINT4 *in)
{
register UINT4 a = buf[0], b = buf[1], c = buf[2], d = buf[3];
/* Implementation of NIST's Secure Hash Algorithm (FIPS 180)
* Lightly bummed for execution efficiency.
*
* Jim Gillogly 3 May 1993
*
* Compile: cc -O -o sha sha.c
*
* To remove the test wrapper and use just the nist_hash()
routine,
* compile with -DONT_WRAP
*
* Usage: sha [-vt] [filename ...]
*
* -v switch: output the filename as well
* -t switch: suppress spaces between 32-bit blocks
*
* If no input files are specified, process standard input.
*
* Output: 40-hex-digit digest of each file specified (160 bits)
*
* Synopsis of the function calls:
*
* sha_file(char *filename, unsigned long *buffer)
* Filename is a file to be opened and processed.
* buffer is a user-supplied array of 5 or more longs.
* The 5-word buffer is filled with 160 bits of non-
terminated hash.
* Returns 0 if successful, non-zero if bad file.
*
* void sha_stream(FILE *stream, unsigned long *buffer)
* Input is from already-opened stream, not file.
*
* void sha_memory(char *mem, long length, unsigned long *buffer)
* Input is a memory block "length" bytes long.
*
* Caveat:
* Not tested for case that requires the high word of the length,
* which would be files larger than 1/2 gig or so.
*
* Limitation:
* sha_memory (the memory block function) will deal with blocks no
longer
* than 4 gigabytes; for longer samples, the stream version will
* probably be most convenient (e.g. perl moby_data.pl | sha).
*
* Bugs:
* The standard is defined for bit strings; I assume bytes.
*
* test vector:
"abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq"
* should compute:
* 0xd2516ee1L
* 0xacfa5bafL
* 0x33dfc1c4L
* 0x71e43844L
* 0x9ef134c8L
*
* test vector: "abc"
* should compute:
* 0x0164b8a9L
* 0x14cd2a5eL
* 0x74c4f7ffL
* 0x082c4d97L
* 0xf1edf880L
*
* Copyright 1993, Dr. James J. Gillogly
* This code may be freely used in any application.
*/
#include
#include
#define TRUE 1
#define FALSE 0
#define SUCCESS 0
#define FAILURE -1
int sha_file(); /* External entries */
void sha_stream(), sha_memory();
static void nist_guts();
#ifndef ONT_WRAP /* Using just the hash routine itself */
#define HASH_SIZE 5 /* Produces 160-bit digest of the message
*/
main(argc, argv)
int argc;
char **argv;
{
unsigned long hbuf[HASH_SIZE];
char *s;
int file_args = FALSE; /* If no files, take it from stdin */
int verbose = FALSE;
int terse = FALSE;
#ifdef MEMTEST
sha_memory("abc", 3l, hbuf); /* NIST test value from
appendix A */
if (verbose) printf("Memory:");
if (terse) printf("%08x%08x%08x%08x%08x\n",
hbuf[0], hbuf[1], hbuf[2], hbuf[3], hbuf[4]);
else printf("%08x %08x %08x %08x %08x\n",
hbuf[0], hbuf[1], hbuf[2], hbuf[3], hbuf[4]);
#endif
for (++argv; --argc; ++argv) /* March down the arg list */
{
verbose = TRUE;
if (**argv == '-') /* Process one or more flags */
for (s = &(*argv)[1]; *s; s++) /* Obfuscated C contest
entry */
switch(*s)
{
case 'v': case 'V':
verbose = TRUE;
break;
case 't': case 'T':
terse = TRUE;
break;
default:
fprintf(stderr, "Unrecognized flag: %c\n", *s);
return FALSE;
}
else /* Process a file */
{
if (verbose) printf("%s:\t\t", *argv);
file_args = TRUE; /* Whether or not we could
read it */
if (sha_file(*argv, hbuf) == FAILURE)
printf("Can't open file %s.\n", *argv);
else
if (terse) printf("%08x%08x%08x%08x%08x\n",
hbuf[0], hbuf[1], hbuf[2], hbuf[3], hbuf[4]);
else printf("%08x %08x %08x %08x %08x\n",
hbuf[0], hbuf[1], hbuf[2], hbuf[3], hbuf[4]);
}
}
if (! file_args) /* No file specified */
{
if (verbose) printf("%s:\t\t", *argv);
sha_stream(stdin, hbuf);
#define r0(f, K) \
temp = S(5, A) + f(B, C, D) + E + *p0++ + K; \
E = D; \
D = C; \
C = S(30, B); \
B = A; \
A = temp
#define r1(f, K) \
temp = S(5, A) + f(B, C, D) + E + \
(*p0++ = *p1++ ^ *p2++ ^ *p3++ ^ *p4++) + K; \
E = D; \
D = C; \
C = S(30, B); \
B = A; \
A = temp
static void nist_guts(file_flag, stream, mem, length, buf)
int file_flag; /* Input from memory, or from
stream? */
FILE *stream;
char *mem;
unsigned long length;
unsigned long *buf;
{
int i, nread, nbits;
union longbyte d;
unsigned long hi_length, lo_length;
int padded;
char *s;
register unsigned long *p0, *p1, *p2, *p3, *p4;
unsigned long A, B, C, D, E, temp;
padded = FALSE;
s = mem;
for (hi_length = lo_length = 0; ;) /* Process 16 longs at a
time */
{
if (file_flag)
{
nread = fread(d.B, 1, 64, stream); /* Read as 64
bytes */
}
else
{
if (length < 64) nread = length;
else nread = 64;
length -= nread;
memcpy(d.B, s, nread);
s += nread;
}
if (nread < 64) /* Partial block? */
{
nbits = nread << 3; /* Length: bits */
if ((lo_length += nbits) < nbits)
hi_length++; /* 64-bit integer */
if (nread < 64 && ! padded) /* Append a single bit */
{
d.B[nread++] = 0x80; /* Using up next byte */
padded = TRUE; /* Single bit once */
}
for (i = nread; i < 64; i++) /* Pad with nulls */
d.B[i] = 0;
if (nread <= 56) /* Room for length in this block */
{
d.W[14] = hi_length;
d.W[15] = lo_length;
}
}
else /* Full block -- get efficient */
{
if ((lo_length += 512) < 512)
hi_length++; /* 64-bit integer */
}
if (nread <= 56) break; /* If it's greater, length in next
block */
}
buf[0] = h0; buf[1] = h1; buf[2] = h2; buf[3] = h3; buf[4] =
h4;
}
Appendix E. Select References
[selection and of passwords in differing threat environments]
Department of Defense Password Management Guideline
CSC-STD-002-85
published by the Computer Security Center of the Department of
Defense Fort George G. Meade, MD 20755
[discovering weak passwords]
The COPS Security Checker System by D. Farmer, E. Spafford
Purdue University Technical Report CSD-TR-993
West Lafayette, IN 47907
[an example of automated key cracking]
With Microscope and Tweezers:
An Analysis of the Internet Virus of 1988
by M. Eichin, J. Rochlis, Massachusetts Institute of Technology
Cambridge, MA 02139
[password vulnerabilities in distributed systems]
Computer Emergency Response - An International Problem
by R. Pethia, K. van Wyk CERT/Software Engineering Institute
Carnegie Mellon University, Pittsburgh, PA 15213
[key metrics and the MD5 message digest algorithm]
Answers to Frequently Asked Questions About Today's Cryptography
Second edition, by Paul Fahn
RSA Laboratories, Redwood City, CA 94065
(available through anonymous FTP from rsa.com)
[implementation details of the MD5 message digest algorithm]
RFC-1321 ('request for comments') The MD5 algorithm
by R. Rivest MIT Center for Computer Science
(available on the internet from gatekeeper.dec.com)
[implementation details of the NIST Secure Hash Standard]
The Secure Hash Standard (SHS) Specification, Jan 1992 DRAFT
Federal Information Processing Standards Publication YY
Director, Computer Systems Laboratory
National Institute of Standards and Technology
Gaithersburg, MD 20899
(The SHS was approved as a Federal Standard in May, 1993)
[other possible approaches to password generation]
Automated Password Generator, NIST publication ????
Director, Computer Systems Laboratory
National Institute of Standards and Technology
Gaithersburg, MD 20899
(a pronounceable password algorithm using DES)
--
Grady Ward grady@netcom.com
3449 Martha Ct. compiler of Moby lexicons
Arcata, CA 95521-4884 e-mail or finger grady@netcom.com
(707) 826-7715 (voice/24hr FAX) for more information