Draft 3
Before his trail-blazing paper on information theory (or "communication theory"), Claude Shannon wrote a confidential precursor paper during World War II on the informational and transmission issues inherent in cryptography, an indication of how closely intertwined are information theory and cryptology.
In this post, we digress from cryptology a bit to approach the issue of "meaning" in information theory, an issue Shannon quite properly avoided by ignoring it. We are going to avoid the philosophical depths of "meaning" also while addressing a continuing concern, the fact that some information is more useful or compelling or relevant than other information. We might think of Shannon's work as a complete generalization of communicative information, whereas our idea is to draw some distinctions. (I have only a modest familiarity with information theory and so I have no idea of whether any of what follows is original.)
For convenience we limit ourselves to the lower case alphabet and assign equal probability to the occurrence of letters in a letter string. We also use the artificially short string n=4. In that case, the Shannon information content of the gibberish string
abbx equals 18.8 bit. The Shannon information value of the word
goal is likewise 18.8 bit.
Now we ask the probability that a four-letter string is an English word. Let us suppose there are 3,000 four-letter English words (I haven't checked). In that case, the probability that a string belongs to the set of English words would be 3000/26^4, or 0.0065, which we now characterize as equivalent to a structured information content of 0.0095 bit. Of course, the alphabet provides the primary (axiomatic?) structure. In this case, an English dictionary provides the secondary structure.
The number of gibberish strings is then 1-0.00656, or 0.9934, which we say is equivalent to a structured information content of 0.0095 bit. We see that these values are closer to our intuitive notion of information and also fits well with the Shannonist notion that a piece of information carries a surprisal value.
Here we say that we are not particularly surprised at the string
abbx because it is a member of a lawless set and because background noise is, in many circumstances, ubiquitous. We say that for our purposes the information value of any member of the lawless set is identical with the information value of the set, as is the case for any member of the structured set and the structured set. On the other hand, the surprisal value of the string
goal is fairly high because of the likelihood that it was not generated randomly and hence stems from a structured or designed set. That is, the chances are fairly good that a mind originated the string
goal but the chances that a mind originated a string such as
abbx are harder to determine. Clearly, our confidence tends to increase with length of string and with the number of set rules.
We see how our concept of structured information fits well with cryptography, though we will not dwell on that here.
Another way to deal with the structure issue here is to ignore the gibberish strings and simply say that
goal has a probability of (say) 1/3000, with an equivalent information content of 11.55 bit.
What we are doing here is getting at a principle. We are not bothering to assign exact probabilities to individual letters, letter pairs, letter triplets or letter quadruplets. We are not assigning an empirical frequency to the word
goal.
Rather, what we are doing, is closing on the problem of assigning an alternative information value to patterns that show a specified structure.
Above, we have used a streamlined alphabet. But a set of some logic language's symbols can be treated like an alphabet. Importantly, we assign only grammatical symbol strings to the structured set, using rules such as ")" cannot be used to begin a sentence. We can then use the process sketched above to assign a structured information value to any string.
Clearly this method can be used for all sorts of sets divided into lawless and lawful subsets, where "law" is a pairing rule or relation. (For example, by this, we could arrange that a non-computable irrational number have a much lower information value than a computable irrational.)
We see that the average information of a gibberish string (as defined via the structured set) is far less than that of the string matching elements of the structured set. For example, the string
abbx rsr is a member of the gibberish set and gibberish, in this case (assuming as a wild guess 5,000 three-letter English words) has a probability of 1-(3,000/26^4) + 1-(5,000/26^3), for an information average of 0.4925 bit. Compare the string
goal new (disregarding word order), which has the complementary probability, with an information average of 50 bit.
Hence, if one saw the message
goal new one could have a strong degree of confidence that the string was not random but stemmed from a designed set.
A design inference?
William Dembski, the scholar who advocates a rational basis for inferring design by intelligence, uses the SETI example to buttress his cause. The hunters of extraterrestrial intelligence, in a fictional account, are astounded by a sequence of radioed 'zeroes and ones' that matches the prime number sequence for the first 100 primes. One must assume that such a low entropy (and high average information) content must be by design, he says, and uses that as a basis for justifying the inference of an intelligent designer behind the creation of life.
However, it should be noted that it seems imperative that in order to have a set of low entropy elements, there must be a human mind to organize that set (not the physical aspects, but the cognitively appreciated set). So, such a bizarre signal from the stars would be recognized as other than background noise because of a centuries-long human effort to distill certain mathematical relations into concise form. Hence, human receivers would recognize a similar intelligence behind the message.
But does that mean one can detect a signal from amid noise without falling into the problem whereby one sees all sorts of "things" in an atmospheric cloud?
That is, when one says that the formation of the first life forms is highly improbable, what does one mean? Can we be sure that the designer set (using human assumptions) has been sufficiently defined? (I am not taking sides here, by the way.)
However, as noted above, the question of computability enters the picture here. Following prevailing scientific opinion (Penrose being an exception), every organism
can be viewed as a machine and every machine responds to a set of algorithms. Hence every machine can be assigned a unique and computable number. One simply assigns numbers to each element of the logic language in use and puts together an algorithm for the machine. The machine's integer number corresponds to some right-left or left-right sequence of symbols (to avoid confusion, the computation may require a high-base number system).
So then, the first organic machines -- proto-cell organisms perhaps -- must be regarded as part of a larger machine, the largest machine of all being the cosmos. But, the cosmos cannot be modeled as a classical machine or computer. See link in sidebar.
A sea of unknowable 'designs'
Also, the set of algorithmic machines (the set of algorithms) is bijective with a subset of the computable reals (some algorithmic substrings are disallowed on grammatical grounds).
Now a way to possibly obtain a noncomputable real is to use a random lottery for choice of the nth digit in the string and, notionally, to continue this process over denumerable infinity. Because the string is completely random we do not know whether it is a member of the computable or noncomputable reals (which set, following Cantor's diagonal proof and other proofs, has a higher infinite cardinality than the set of computable reals).
So there is no reason to conclude that a grammatical string might not be a member of the noncomputables. In fact, there must be a nondenumerable infinity of such strings.
Nevertheless, a machine algorithm is normally defined as always finite. On the other hand, one could imagine that a machine with an eternal time frame might have an infinite-step algorithm.
That is, what we have arrived at is the potential for machine algorithms that cannot possibly have been arrived at by human ken. Specifically, we have shown that there exists a nondenumerable infinity of grammatical statements of infinite length. One might then argue that there is a vast sea of infinite designs that the human mind cannot apprehend.