## Saturday, September 09, 2006

### Power cipher II

This next cipher (which is really a cipher/code hybrid) couples the pseudorandom generator with an old system, the dictionary method.

During the U.S. Civil War, one method used was the dictionary system, whereby words were swapped according to some specific scheme. For example, the encoder would find his word at the mth position on page x of a copy of a dictionary used by both parties and then go backward or forward n pages and count down to the mth word on that page. The decoder would find the code word on page (x - n) and then find the decryption at position m on the xth page.

Such systems are easy to crack. But, not when hitched to a computerized process!

There are two big advantages to the system I propose.

* On average, there is no data inflation. In many systems, nulls are an important feature for bringing noise into the message. Also, most systems inflate because they are perforce not as efficient as the most efficient Huffman-type codes used for the plaintext. However, this problem does not occur in our system.

* Since the message contains all natural language words, the frequencies will reflect those of the language in the dictionary used. Hence, the fact of encryption won't be detected by simple automated programs that use standard frequency analysis.

Here's the system:

We use a public key method for arranging that the recipient computer receive an identical copy of some lexicon of n words. Then we send a key for a deterministic recursive pseudorandom number generator, plus a set of seeds (initial values).

The pseudorandom function and seed set are automatically changed via public key after specified numbers of transmissions.

Each word in the dictionary is assigned a number, which reflects its canonical (alphabetical) position. At every mth word of the message a new seed is used to generate a new permutation.

The generator then constructs a permutation of the numbers [1, n]. The original lexicon is then put in a bijective relationship with the permuted lexicon. This substitution cipher/code permits the "random" possibility that a word will represent itself. This is helpful in that involutary functions can tell the adversary where a probable word isn't. The fact that the lexicon can have an n as high as 500,000 (approximate number of words in English) means that unscrambling can be put out of computational reach, especially in light of the fact that the permutation is changed so often as to make comparisons of any sort quite unlikely.

This system however does not contain the super-encryption of the system below, which adds to security by masking the pseudo-random number generator. So, in exchange for good camouflage and efficiency, the system sacrifices the extra security of Hill super-encipherment.

A variant of this system: use two dictionaries (even if separate languages). Both must be transmitted to the recipient. The, say, English lexicon of n words is then coupled with, perhaps, a Russian dictionary of n words, which is permuted as above. So the English plaintext comes out as a mostly non-grammatical string of Russian words.

Of course, this natural-language counterfeiter would face a countermeasure in the form of a detector that counts frequencies of common one-, two- and three-letter words, such as "a", "I", "of", "be", "and", and "the." A method could then be devised to outsmart the detector, but at the expense of efficiency (we'd have to introduce nulls).