Tuesday, June 26, 2007

On Hilbert's sixth problem

The world of null-H post is the next post down.

There is no consensus on whether Hilbert's sixth problem: Can physics be axiomatized? has been answered.

From Wikipedia, we have this statement attributed to Hilbert:

6. Mathematical treatment of the axioms of physics. The investigations of the foundations of geometry suggest the problem: To treat in the same manner, by means of axioms, those physical sciences in which today mathematics plays an important part; in the first rank are the theory of probabilities and mechanics.

Hilbert proposed his problems near the dawn of the Planck revolution, while the debate was raging about statistical methods and entropy, and the atomic hypothesis. It would be another five years before Einstein conclusively proved the existence of atoms.

It would be another year before Russell discovered the set of all sets paradox, which is similar to Cantor's power set paradox. Though Cantor uncovered this paradox, or perhaps theorem, in the late 1890s, I am uncertain how cognizant of it Hilbert was.

Interestingly, by the 1920s, Zermelo, Fraenkel and Skolem had axiomatized set theory, specifically forbidding that a set could be an element of itself and hence getting rid of the annoying self-referencing issues that so challenged Russell and Whitehead. But, in the early 1930s, along came Goedel and proved that ZF set theory was either inconsistent or incomplete. His proof actually used Russell's Principia Mathematica as a basis, but generalizes to apply to all but very limited mathematical systems of deduction. Since mathematical systems can be defined in terms of ZF, it follows that ZF must contain some theorems that cannot be tracked back to axioms. So the attempt to axiomatize ZF didn't completely succeed.

In turn, it would seem that Goedel, who began his career as a physicist, had knocked the wind out of Problem 6. Of course, many physicists have not accepted this point, arguing that Goedel's incompleteness theorem applies to only one essentially trivial matter.

In a previous essay, I have discussed the impossibility of modeling the universe as a Turing machine. If that argument is correct, then it would seem that Hilbert's sixth problem has been answered. But I propose here to skip the Turing machine concept and use another idea.

Conceptually, if a number is computable, a Turing machine can compute it. Then again Church's lamda calculus, a recursive method, also allegedly could compute any computable. So are the general Turing machine and the lamda calculus equivalent? Church's thesis conjectures that they are, implying that it is unknown whether either misses some computables (rationals or rational approximations to irrationals).

But Boolean algebra is the real-world venue used by computer scientists. If an output can't be computed with a Boolean system, no one will bother with it. So it seems appropriate to define an algorithm as anything that can be modeled by an mxn truth table and its corresponding Boolean statement.

The truth table has a Boolean statement where each element is above the relevant column. So a sequence of truth tables can be redrawn as a single truth table under a statement combined from the sub-statements. If a sequence of truth tables branches into parallel sequences, the parallel sequences can be placed consecutively and recombined with an appropriate connective.

One may ask about more than one simultaneous output value. We regard this as a single output set with n output elements.

So then, if something is computable, we expect that there is some finite mxn truth table and corresponding Boolean statement. Now we already know that Goedel has proved that, for any sufficiently rich system, there is a Boolean statement that is true, but NOT provably so. That is, the statement is constructible using lawful combinations of Boolean symbols, but the statement cannot be derived from axioms without extension of the axioms, which in turn implies another statement that cannot be derived from the extended axioms, ad infinitum.

Hence, not every truth table, and not every algorithm, can be reduced to axioms. That is, there must always be an algorithm or truth table that shows that a "scientific" system of deduction is always either inconsistent or incomplete.

Now suppose we ignore that point and assume that human minds are able to model the universe as an algorithm, perhaps as some mathematico-logical theory; i.e., a group of "cause-effect" logic gates, or specifically, as some mxn truth table. Obviously, we have to account for quantum uncertainty. Yet, suppose we can do that and also suppose that the truth table need only work with rational numbers, perhaps on grounds that continuous phenomena are a convenient fiction and that the universe operates in quantum spurts.

Yet there is another proof of incompleteness. The algorithm, or its associated truth table, is an output value of the universe -- though some might argue that the algorithm is a Platonic idea that one discovers rather than constructs. Still, once scientists arrive at this table, we must agree that the laws of mechanics supposedly were at work so that the thoughts and actions of these scientists were part of a massively complex system of logic gate equivalents.

So then the n-character, grammatically correct Boolean statement for the universe must have itself as an output value. Now, we can regard this statement as a unique number by simply assigning integer values to each element of the set of Boolean symbols. The integers then follow a specific order, yielding a corresponding integer.
(The number of symbols n may be regarded as corresponding to some finite time interval.)

Now then, supposing the cosmos is a machine governed by the cosmic program, the cosmic number should be computable by this machine (again the scientists involved acting as relays, logic gates and so forth). However, the machine needs to be instructed to compute this number. So the machine must compute the basis of the "choice." So it must have a program to compute the program that selects which Boolean statement to use, which in turn implies another such program, ad infinitum.

In fact, there are two related issues here: the Boolean algebra used to represent the cosmic physical system requires a set of axioms, such as Hutchinson's postulates, in order to be of service. But how does the program decide which axioms it needs for itself? Similarly, the specific Boolean statement requires its own set of axioms. Again, how does the program decide on the proper axioms?

So then, the cosmos cannot be fully modeled according to normal scientific logic -- though one can use such logic to find intersections of sets of "events." Then one is left to wonder whether a different system of representation might also be valid, though the systems might not be fully equivalent.

At any rate, the verdict is clear: what is normally regarded as the discipline of physics cannot be axiomatized without resort to infinite regression.

So, we now face the possibility that two scientific systems of representation may each be correct and yet not equivalent.

To illustrate this idea, consider the base 10 and base 2 number systems. There are some non-integer rationals in base 10 that cannot be expressed in base 2, although approximation can be made as close as we like. These two systems of representation of rationals are not equivalent.

(Cantor's algorithm to compute all the rationals uses the base 10 system. However, he did not show that all base n rationals appear in the base 10 system.)

Monday, June 25, 2007

The world of null-H

Some thoughts on data compression, implicate order and entropy (H):

Consider a set of automated machines. We represent each machine as a sequence of logic gates. Each gate is assigned an integer. In that case each machine can be represented as a unique number, to wit its sequence of logic gate numbers. If subroutines are done in parallel, we can still find some way to express the circuit as a single unique number, of course.

In fact, while we're at it, let us define an algorithm as any procedure that can be modeled by a truth table. Each table is a constructable mxn matrix which clearly is unique and hence can be written as a sequence of 0's and 1's read off row by row with a precursor number indicating dimensions. In that case, the table has a bit value of more than nxm. On the other hand, each machine's table is unique and can be assigned an index number, which may have a considerably lower bit value than nxm.

Now suppose, for convenience, we choose 10 machines with unique truth table numbers that happen to be n bits in length. We then number these machines from 1 to 10.

Now, when we send a message about one of the machines to someone who has the list of machines and reference numbers stored, the message can be compressed as a number between 1 and 10 (speaking in base-10 for convenience). So the information value of, say 7, is far lower than that for n=25 base-2 digits. Suppose that it is equally probable that any machine description will be sent. In that case the probability, in base 2, is 2-25, and the information value is 25 bits.

However, if one transmits the long-form description, it is no better than transmitting the three-bit representation of 7 (111). Clearly the information value of 3 bits for 7 is far lower than the 25 bits for the long description.

Of course Shannon's memoryless channel is a convenient fiction, allowing a partial desription which is often useful (he did some pioneering work on channels with memory also, but I haven't seen it). These days, few channels are memoryless, since almost every computer system comes with look-up functions.

So what we have is data compression. The 25 bits have been compressed into some low number of bits. But, we don't have information compression, or do we we?

If one transmits the long form message to the receiver, that information is no more useful to the receiver than the information in the abbreviated form. Iimplicit will do. Iexplicit has no additional surprisal value to the receiver.

So Iexplicit - Iimplicit might be considered the difference between the stored information value and the transmitted information value. But, once the system is up and running, this stored information is useless. It is dead information -- unless someone needs to examine the inner workings of the machine and needs to look it up. Otherwise the persons on each end can talk in compressed form about Machine X and Machine Y all the time without ever referring to the machine details. So Ie - Ii = -Ii under those conditions.

So stored information is dead information for much of the time. It has zero value unless resurrected by someone with an incomplete memory of the long integer string.

Now we have not bothered with the issue of the average information (entropy) of the characters, which here is a minor point. But clearly the entropy of the messaging system increases with compression. Information is "lost" to the storage unit (memory).

However, if someone consults the memory unit, stored information is recovered and the entropy declines.

The point is that entropy in this sense seems to require an observer. Information doesn't really have a single objective value.

Yes, but perhaps this doesn't apply to thermodynamics, you say. The entropy of the universe always declines. Turned around, that statement really means that the most probable events will usually occur and the least probable usually won't. So many scientists seek to redefine sets of "events" in order to discover more intersections. They seek to reduce the number of necessary sets to a minimum.

Note that each set might be treated as a machine with its set language elements denoted by numbers. In that case sets with long integer numbers can be represented by short numbers. In that case, again, entropy seems to be observer-dependent.

Of course one can still argue that the probability of the memory unit remaining intact decreases with time. Now we enter into the self-referencing arena -- as in Russell's set of all sets -- in that we can point out that the design of the memory unit may well require another look-up system, again implying that information and entropy are observer-dependent, not sometimes but always.

Consider a quantum experiment such as a double-slit experiment one photon at a time.
The emitted photon will land in a region of the interference pattern with a specific probability derived from the square of the photon's wave amplitude.

If we regard the signal as corresponding to the coordinates of the detected photon, then the received signal carries an information value equal to -log(p), where p is the probability for those coordinates. (I am ignoring the unneeded term "qubit" here.)

In that case, the entropy of the system is found by -p1log(p1) +...+ -pnlog(pn).

So the entropy corresponds to the wave function and the information corresponds to the collapse of the wave function -- and we see that information is observer-dependent. The observer has increased the information and decreased the general entropy.

On the other hand, information is a fleeting thing in both the classical and quantum arenas. "Shortly" after the information is received, it loses its surprisal value and dies -- until a new observer obtains it ("new observer" could mean the same observer who forgets the information).

Sunday, June 24, 2007

The Kalin cipher

Note: The Kalin cipher described below can of course be used in tandem with a public key system, or it can be done by hand with calculators. An appropriate software program for doing the specified operations would be helpful.

The idea of the cipher is to counteract frequency analysis via matrix methods.

Choose a message of n characters and divide n by some integer square. The remainder can be padded out with dummy numbers.

Arrange the message into mxm matrices, as shown,

H I H x
O W A y
R E U z

We then augment the matrix with a fourth column. This column is the first key, which is a random number with no zeros. A second 4x3 matrix is filled with random integers. This is the second key. The keys needn't be random. Easily remembered combinations might do for some types of work because one would have to know the cipher method in order to use guessed keys.

We then matrix multiply the two as MK and put MK into row canonical form.
This results in the nine numbers in M being reduced to three in [I|b], where b is the final column.

8 9 8 x 7 9 2 1 0 0 a
8 15 1 y 5 5 4 0 1 0 b
18 5 21 z 3 2 3 = 0 0 1 c
5 2 3

where (a, b, c) is a set of three rationals.

The keys can be supplied by a pseudorandom number generator in step with a decoder program. A key can vary with each message matrix or remain constant for a period, depending on how complex one wishes to get. But, as said, if one is conducting an operation where it is advantageous for people to remember keys, this method might prove useful.

By the way, if a row of the original message matrix repeats or if one row is a multiple of another, a dummy character is inserted to make sure no row is a multiple of another so that we can obtain the canonical form. Likewise, the key matrix has no repeating rows.

In fact, on occasion other square matrices will not reduce to the I form. A case in point:

a+b b+c a+b
a b c
1 1 1

The determinant of this matrix is 0, meaning the I form can't be obtained.
In general, our software program should check the determinant of the draft message matrix to make sure it is non-zero. If so, a dummy letter should be inserted, or two inconsequential characters should be swapped as long as the meaning isn't lost.

But, if the program doesn't check the determinant it will give a null result for the compression attempt and hence would be instructed to vary the message slightly.

Notice that it is not necessary to transmit I. So the message is condensed into three numbers, tending to defeat frequency analysis.

Here is a simple example, where for my convenience, I have used the smallest square matrix:

We encrypt the word "abba" (Hebrew for "father") thus:

1 2 3
2 1 1

where the first two columns represent "abba" and the third column is the first key.
We now matrix multiply this 2x3 matrix by a 3x2 matrix which contains the second key.

1 2 3 x 2 1 = 10 16
2 1 1 1 3 7 8
2 3

We then apply key 1 (or another key if we like) and reduce to canonical form:

10 16 3
7 8 1

which is uniquely represented by

1 0 -1/4
0 1 11/32

By reversing the operations on (-1/4, 11/32), we can recover the word "abba." If we encode some other four characters, we can similarly reduce them to two numbers. Make a new matrix

-1/4 x 3
11/32 y 1

which we can fold into another two number set (u, v).

We see here an added countermeasure against frequency analysis, provided the message is long enough: gather the condensed column vectors into new matrices.

We can do this repeatedly. All we need do is divide by, in this case 4, and recombine. If we have 42n matrices to begin with, it is possible to transmit the entire message as a set of two numbers, though usually more than one condensed set would be necessary. Of course we can always pad the message so that we have a block of x2n. Still, the numbers tend to grow as the operations are done and may become unwieldy after too many condensations. On the other hand, frequency analysis faces a tough obstacle, I would guess.

So a third key is the number of enfoldments. Three enfoldments would mean three unfoldments. Suppose we use a 4x4 system on 64 characters with three enfoldments.
We write this on 16 matrices. After the transform, we throw away the I matrices and gather the remaining columns sequentially into a set of 4 matrices. We transform again and are left with a single column of four numbers. So if the adversary doesn't know the number of enfoldments, he must try them all, assuming he knows the method. Of course that number may be varied by some automated procedure linked to a public key system.

Just as it is always possible to put an nxn matrix without a zero determinant [and containing no zeros] into canonical form, it is always possible to recover the original matrix from that form. The methods of linear algebra are used.

Decryption is also hampered by the fact that in matrix multiplication AB does not usually equal BA.

The use of canonical form and the "refolding" of the matrices is what makes the Kalin cipher unique, I suppose. When I say unique, I mean that the Kalin cipher has not appeared in the various popular accounts of ciphers.

An additional possibility: when an nxn matrix is folded into an n column vector, we might use some "dummy" numbers to form a different dimension matrix. For example, suppose we end up with a 3-entry column vector. We add a dummy character to that string and form a 2x2 matrix, which can then be compressed into a 2-entry column vector. Of course, the receiver program would have to know that a 2-vector is compressed from a 3-vector. Also the key for the 2x2 matrix is so small that a decryption program could easily try all combinations.

However, if a 100x100 matrix folds into a 10-entry column vector, six dummy characters can be added and a 4x4 matrix can be constructed, leaving a 4-vector, which can again be folded into a 2-vector. All sorts of such systems can be devised.

Additionally, a "comma code" can be used to string the vectors together into one number. The decipherment program would read this bit string to mean a space between vector entries.

Clearly the method of using bases or representations as keys and then transforming offers all sorts of complexities -- perhaps not all useful -- but the emphasis on matrix condensation seems to offer a practical antidote to frequency analysis.

BTW, I have not bothered to decimalize the rational fractions. But presumably one would convert to the decimal equivalent in order to avoid drawing attention to the likelihood that the numbers represent canonical row vectors. And, of course, if one is using base 2, one would convert each rational to a linear digit string. Not all decimal fractions can be converted exactly into binary. However, supposing enough bits are used, the unfolded (deciphered) number will, with high probability, be very close to the correct integer representing the character.

Saturday, June 02, 2007

Thumbnail of NIST's 9/11 scenario

Here's a thumbnail of the scenario used by the National Institute of Standards and Technology to justify its computer modeling:

When the plane collided with a tower, a spray of debris "sandblasted" the core columns, stripping them of fireproofing. Some fireproofing may have come off because of the vibrations on impact.

Some jet fuel immediately burst into flame, in a fireball seen from the street, with some of the fiery fuel dropping down the core elevator shaft and spilling out onto random floors, setting a few afire.

Jet fuel, whether fueling an explosion or an ordinary fire, cannot produce enough energy to critically weaken naked core columns in the time alloted, the NIST found.
Hence, the NIST presumed that fires were accelerated by the jet fuel but got hot enough from office materials and furnishings, which the NIST variously put at five pounds per square foot or four pounds per square foot on average.

Though the heat energy actually detected via infrared analysis and visual inspection of the photographic record was far too low to have caused "global collapse," the NIST conjectured that the fires were hotter and more intense closer to the core.

The NIST's mechanism for collapse required sufficiently energetic fires to send heat to the ceiling -- which the NIST modeled as retaining its fireproofing -- and that heat was then convected along the ceiling and sideways into the load-bearing columns and connectors.

Had the fireproofing also come off the ceiling, the heat would have dissipated upward through the floor slab and, the NIST's computer models showed, the building would have stood.

Though there were fires on lower floors, the columns would have retained most of their fireproofing, and so critical weakening wasn't possible. No witnesses reported fireproofing off bare columns on lower floors.

The NIST said collapse was initiated by the shortening of core columns -- whether from buckling or from severing. Presumably this shortening would have occurred on the impact/fire floors, since that's where the fireproofing would have come off.

The NIST realized that blown-out windows supplied oxygen to keep the fire going, but also permitted much heat to dissipate out (with the smoke), making less available for the "blowtorch" scenario.

The NIST said that the impact zone fires seemed to progress around a floor in fits and starts until it had come all the way around the core.

Here are some concerns:

In Tower 2, which fell first, the observed fire energy was very low, the NIST agreed. The NIST attributed Tower 2's fall to structural damage from plane impact with some input from the fires. The plane that struck Tower 2 clipped the corner, meaning the collision force with the core would have been lower than in Tower 1, not more. Yet it is the core that supported the weight of the structure, not the exterior.

Tower 2's top block collapsed at a 23 degree angle. What this means is that Tower 2 had much less "hammer-down" force available for the lower part of the building than some have suggested.

On the other hand, Tower 1's top block came essentially straight down. What this implies is that the core columns on the opposite side of the elevator shaft from the plane were damaged roughly equally with those on the impact side. Otherwise, the top would have tilted over, as if on a hinge. Hence, one is forced by the NIST to accept the idea that a lot of fireproofing was shaken off the opposed columns, but not off the ceiling.

Also, in order for the fires -- which dimmed and flared as oxygen lapsed or became available when the heated air blew out windows -- to damage the core columns in a manner that initiates global collapse, not only must a specific percentage of the 47 columns be critically damaged, but the critical damage must be roughly evenly spaced.
That is, if a set of closely spaced columns lost strength, the building is less likely to give way, since the load is, by design, transferred to the remaining columns.

In the case of Tower 1, the damage to the core columns must be equivalent on the opposing side of the shaft, implying that the "blowtorch" acted symmetrically with respect to energy.

The NIST, in order to get the simulation to work, must have required that on each floor the fire maintain a specific rate of progress and a specific average ceiling energy. That is, if the fire moved through the debris too slowly, it would burn out or simply fail to do enough damage to other columns. But if it moved too rapidly, the circling fire would fail to bring enough heat to bear on any particular column to bring about critical weakening.

Though massive blasts were witnessed from outside just prior to collapse, the NIST felt obliged to discount these explosions as immediate collapse causes -- because the jet fuel simply wouldn't provide enough energy to do critical structural damage.