How the HardEncrypt Package works


The HardEncrypt package uses a simple, classic encryption scheme--the one time pad. This encryption method works by taking a message and a key that is the same size as the message and "adding" them together. The message can be decrypted later by "subtracting" the key file from the encrypted message.

This sort of general scheme works on messages in many different formats. For instance, if your message was a number between 0 and 1,000,000, and your key was in this space also, the message could be encrypted simply by adding the two numbers (the key and the message) together. Keeping the encrypted message (possibly a number between 0 and 2,000,000) inside the message space is important to ensure that encrypted messages can be transmitted by the same means as plaintext messages. Thus, in this example, we would let our encrypted message "wrap-around" if it was greater than 999,999 (eg., 1,000,123 wraps to 123). Notice that in this example, given a message and a key, there is exactly 1 encrypted message that can be produced. Also, given and encrypted message and a key, there is only one decrypted message that can be produced. This is an important feature.

So, with our simple example, we have encrypted messages that are in the message space, and given a key, there is a 1 to 1 relationship between encrypted and plaintext messages. Now we come to the most interesting feature of all. Given an encrypted message, a number between 0 and 1,000,000, if the key is not known, how can an attacker find the plaintext message? Note that if the attacker doesn't know the key, the message could be any of the 1,000,000 possible messages. All the attacker can do is guess. Even if the attacker guesses correctly, he/she won't know it. Thus, we have a "perfect" encryption scheme capable of mapping any message in our message space onto any other message in the same space. The upshot is that given an encrypted message, without the key, there is no way to decrypt it.

The scheme presented in this simple example is completely scalable. Imagine now that instead of the message being a number between 0 and 1,000,000, it's a sequence of 10 numbers, each between 0 and 1,000,000. A key would be a similar sequence of 10 numbers, and wrap-around addition (term-by-term) would be used to encrypt, and subtraction would be used to decrypt.

Slight modifications are made to this scheme to produce the scheme used by HardEncrypt. Instead of numbers between 0 and 1,000,000 as terms in members of the message space, use numbers between 0 and 1 (inclusive, aka. binary). Instead of using a sequence of 10 numbers as the message space, use sequences of variable length. For each different message length, we effectively have a different message space. Use term-by-term addition for encryption, and subtraction for decryption. Notice that for binary terms, addition and subtraction (without inter-term carries) are the same operation. (1+0 = 1, 1-0=1, 1+1=0, 1-1=0, etc.) Given a message space consisting of messages of length N, keys consist of a binary sequence of length N, and we can use addition for both encryption and decryption.

All the same properties that were present in our simplest example are still present in this binary scheme. Given an encrypted message and no key, all messages in the space are possibilities. Thus, for a message of as few as 100 characters (8 binary bits per character), there are 2^800 different messages in the message space. Given an encrypted message, an attacker can at best guess from the selection of these 2^800 messages. Even if the attacker guesses correctly, he/she won't know it.

At this point, it should be apparent that the secrecy of the key is of utmost importance. If an attacker can get your key file, he/she can read your encrypted messages. What's more, if an attacker can figure out the algorithm that you used to generate your key file, he/she can recreate your key file and decrypt your files. Thus, it's important to make key files *without* using an algorithm. A very complicated algorithm could be sufficient if it was kept a secret, but such an algorithm is not practical for publicly distributed encryption software. What kind of data can a key contain, or what kind of data are produced by no algorithm? The answer is random data.

Filling a key full of random data sounds easy enough, but as a matter of fact, producing random numbers on a computer using only software is IMPOSSIBLE. Computer systems are, and for the most part should be, completely deterministic systems. They do exactly what they're instructed to do, and they do it in the same way every time. An amateur programmer might say, "well, what about the rand() function? That returns random numbers." Actually, rand() and other such "random" number generators don't generate random numbers, but rather spit out pseudo-random numbers. If you call rand() repeatedly, you'll notice that the values it returns eventually repeat. Such software functions are most often implemented using simple mathematical formulae involving modular arithmetic. rand() and other such functions also produce the same string of values every time they're used. They can be reseeded with a value, and they will then start outputting a different, repeating cycle of pseudo-random values, but there are only a finite number of seeds.

So, if key files were generated using software random number generators, an attacker would only need to try all possible random seeds before finding the right key file. For the ANSI standard library's implementation of rand(), there are only 65,536 possible seeds: an attacker would be able to find your key file in less than one minute on an average computer system.

Truly random numbers can only be generated by special-purpose hardware. Such hardware sources can rely on physical, analog, chaotic values to produce good random numbers (e.g., line noise). Many hardware random number sources have been proposed, and implemented, over the years, but none of them are included in wide ranges of computer systems.

So, relying on special-purpose hardware is out of the question for a cross-platform, portable encryption package. It would seem as though we're out of luck as far as random number generation goes. But, there is one piece of hardware that produces millions of random bits per second and happens to be included in most, if not all, modern computer systems--a sound card. Since sound cards deal with analog data that's full of noise, they are perfect sources for truely random numbers. The random numbers produced by a sound card never repeat, are not produced by an algorithm, and are irreproducible. Many sound card have enough inherent line noise, even with no input, to be useful sources of random numbers. But, with user audio input, sound cards are guaranteed to be sources of good random numbers.

However, the interface to access soundcards on various platforms is not consistent, making direct sound card usage impractical for cross-platform encryption software. Thus, HardEncrypt makes use of audio data files as good sources of truly random numbers. Access to files is a platform-independent operation as far as the program source code is concerned. A user with a sound card can easily produce a good audio file that HardEncrypt can use.

HardEncrypt's GenKeyFile program takes such an audio file as input and produces a key file from it. GenKeyFile pseudo-randomly mixes the audio seed file to get rid of holes and to hide the file header values (information that similar for all files of the same type) in with the file data. This feature is primarily intended to help the novice user while not hurting the advanced user. If no mixing were done, an attacker, knowing that the key was an audio file, would simply use an audio file header to decrypt at least the beginning of an encrypted message. With the mixing scheme, at least no continuous block of the message is completely exposed to attack. A determined attacker could try every possible pseudo-random mixing pattern until he/she found the one that had been used to mix the header into the rest of the key file, but then the attacker would only be able to decrypt the message in a few, far-scattered places. A more advanced user would remove the header from the audio seed file before running GenKeyFile, avoiding the issue completely.

We thus have a completely robust, uncrackable encryption scheme. The message space is all binary files, which is a super-space of all text files. Key files are in the message space, and encryption maps (in a 1-1 and onto fashion) from the message space to the message space. Key files are generated in an irreproducible way and contain truly random data. Given an encrypted message of any length, an attacker who doesn't have the key file can guess at the message contents, but will never know if he/she guessed correctly. In other words, the attacker will never be able to read an encrypted message.