## Monday, June 25, 2007

### The world of null-H

Some thoughts on data compression, implicate order and entropy (H):

Consider a set of automated machines. We represent each machine as a sequence of logic gates. Each gate is assigned an integer. In that case each machine can be represented as a unique number, to wit its sequence of logic gate numbers. If subroutines are done in parallel, we can still find some way to express the circuit as a single unique number, of course.

In fact, while we're at it, let us define an algorithm as any procedure that can be modeled by a truth table. Each table is a constructable mxn matrix which clearly is unique and hence can be written as a sequence of 0's and 1's read off row by row with a precursor number indicating dimensions. In that case, the table has a bit value of more than nxm. On the other hand, each machine's table is unique and can be assigned an index number, which may have a considerably lower bit value than nxm.

Now suppose, for convenience, we choose 10 machines with unique truth table numbers that happen to be n bits in length. We then number these machines from 1 to 10.

Now, when we send a message about one of the machines to someone who has the list of machines and reference numbers stored, the message can be compressed as a number between 1 and 10 (speaking in base-10 for convenience). So the information value of, say 7, is far lower than that for n=25 base-2 digits. Suppose that it is equally probable that any machine description will be sent. In that case the probability, in base 2, is 2-25, and the information value is 25 bits.

However, if one transmits the long-form description, it is no better than transmitting the three-bit representation of 7 (111). Clearly the information value of 3 bits for 7 is far lower than the 25 bits for the long description.

Of course Shannon's memoryless channel is a convenient fiction, allowing a partial desription which is often useful (he did some pioneering work on channels with memory also, but I haven't seen it). These days, few channels are memoryless, since almost every computer system comes with look-up functions.

So what we have is data compression. The 25 bits have been compressed into some low number of bits. But, we don't have information compression, or do we we?

If one transmits the long form message to the receiver, that information is no more useful to the receiver than the information in the abbreviated form. Iimplicit will do. Iexplicit has no additional surprisal value to the receiver.

So Iexplicit - Iimplicit might be considered the difference between the stored information value and the transmitted information value. But, once the system is up and running, this stored information is useless. It is dead information -- unless someone needs to examine the inner workings of the machine and needs to look it up. Otherwise the persons on each end can talk in compressed form about Machine X and Machine Y all the time without ever referring to the machine details. So Ie - Ii = -Ii under those conditions.

So stored information is dead information for much of the time. It has zero value unless resurrected by someone with an incomplete memory of the long integer string.

Now we have not bothered with the issue of the average information (entropy) of the characters, which here is a minor point. But clearly the entropy of the messaging system increases with compression. Information is "lost" to the storage unit (memory).

However, if someone consults the memory unit, stored information is recovered and the entropy declines.

The point is that entropy in this sense seems to require an observer. Information doesn't really have a single objective value.

Yes, but perhaps this doesn't apply to thermodynamics, you say. The entropy of the universe always declines. Turned around, that statement really means that the most probable events will usually occur and the least probable usually won't. So many scientists seek to redefine sets of "events" in order to discover more intersections. They seek to reduce the number of necessary sets to a minimum.

Note that each set might be treated as a machine with its set language elements denoted by numbers. In that case sets with long integer numbers can be represented by short numbers. In that case, again, entropy seems to be observer-dependent.

Of course one can still argue that the probability of the memory unit remaining intact decreases with time. Now we enter into the self-referencing arena -- as in Russell's set of all sets -- in that we can point out that the design of the memory unit may well require another look-up system, again implying that information and entropy are observer-dependent, not sometimes but always.

Consider a quantum experiment such as a double-slit experiment one photon at a time.
The emitted photon will land in a region of the interference pattern with a specific probability derived from the square of the photon's wave amplitude.

If we regard the signal as corresponding to the coordinates of the detected photon, then the received signal carries an information value equal to -log(p), where p is the probability for those coordinates. (I am ignoring the unneeded term "qubit" here.)

In that case, the entropy of the system is found by -p1log(p1) +...+ -pnlog(pn).

So the entropy corresponds to the wave function and the information corresponds to the collapse of the wave function -- and we see that information is observer-dependent. The observer has increased the information and decreased the general entropy.

On the other hand, information is a fleeting thing in both the classical and quantum arenas. "Shortly" after the information is received, it loses its surprisal value and dies -- until a new observer obtains it ("new observer" could mean the same observer who forgets the information).