Tales from dekrypt

2007-07-12T06:08:00.000-07:00

The prosecutor's fallacy

There are various forms of the "prosecutor's paradox" or "the prosecutor's fallacy," in which probabilities are used to assign guilt to a defendant. But probability is a slippery subject.

For example, a set of circumstances may seem or be highly improbable. But defense attorneys might wish to avail themselves of something else: the more key facts there are in a string of facts, the higher the probability that at least one fact is false. Of course, that probability is difficult to establish unless one knows either the witnesses' rates of observational error or some standard rates of observational error, such as the rate typical of an untrained observer versus an error rate typical of a police officer.

(For a non-rigorous but useful example of likelihood of critical misstatement, please see the post Enrico Fermi and a 9/11 plausibility test. In that post we are testing plausibility which is far different from ironclad guilt or innocence. Also, for a discussion of probabilities of wrongful execution, please search Fatal flaws at Znewz1.blogspot.com.)

Suppose an eyewitness is tested for quick recall and shows a success rate of 96 percent and a 4 percent error rate. If the witness is testifying to 7 things he saw or heard with no prior knowledge concerning these things, the likelihood that the testimony is completely accurate is about 75 percent. So does the 25 percent probability of error constitute reasonable doubt -- especially if no fact can be expunged without forcing a verdict of not guilty? (Of course, this is why the common thread by several witnesses tends to have more accuracy; errors tend to cancel out.)

The prosecutor's paradox is well illustrated by The people v.
Collins, a case from 1964 in which independent probabilities were incorrectly used, the consequence being dismissal of the conviction on appeal.

To summarize, a woman was shoved to the ground and her purse snatched. She and a nearby witness gave a description to Los Angeles police which resulted in the arrest of a white woman and a black man. I do not intend to treat the specifics of this case, but rather just to look at the probability argument.

The prosecutor told the jury that the arrested persons matched the description given to police so closely that the probability of their innocence was about 1 in 12 million.

The prosecutor gave these probabilities:

Yellow auto, 1/10; mustached man, 1/4; woman with ponytail, 1/10; woman with blonde hair, 1/3; black man with beard, 1/10; interracial couple in car, 1/1000. With a math professor serving as an expert witness, these probabilities were multiplied together and the result was the astoundingly high probability of "guilt."

However, the prosecutor did not conduct a comparable test of witness error rate. Suppose the witnesses had an average observational error rate of 5 percent. The probability that at least one fact is wrong is about 26 percent. Even so, if one fact is wrong, the computed probability of a correct match remains very high. Yet, if that fact was essential to the case, then a not guilty verdict is still forced, probability or no.

But this is not the only problem with the prosecutor's argument. As the appellate court wrote, there seems to be little or no justification for the cited statistics, several of which appear imprecise. On the other hand, the notion that the reasoning is never useful in a legal matter doesn't tell the whole story.

Among criticisms leveled at the Los Angeles prosecutor's reasoning was that conditional probabilities weren't taken into account. However, I would say that conditional probabilities need not be taken into account if a method is found to randomize the collection of traits or facts and to limit the intrusion of confounding bias.

But also the circumstances of arrest are critical in such a probability assessment. If the couple was stopped in a yellow car within minutes and blocks of the robbery, a probability assessment might make sense (though of course jurors would then use their internalized probability calculators, or "common sense"). However, if the couple is picked up on suspicion miles away and hours later, the probability of a match may still be high. But the probability of error increases with time and distance.

Here we run into the issue of false positives. A test can have a probability of accuracy of 99 percent, and yet the probability that that particular event is a match can have a very low probability. Take an example given by mathematician John Allen Paulos. Suppose a terrorist profile program is 99 percent accurate and let's say that 1 in a million Americans is a terrorist. That makes 300 terrorists. The program would be expected to catch 297 of those terrorists. However, the program has an error rate of 1 percent. One percent of 300 million Americans is 3 million people. So a data-mining operation would turn up some 3 million "suspects" who fit the terrorist profile but are innocent nonetheless. So the probability that a positive result identifies a real terrorist is 297 divided by 3 million, or about one in 30,000 -- a very low likelihood.

But data mining isn't the only issue. Consider biometric markers, such as a set of facial features, fingerprints or DNA patterns. The same rule applies. It may be that if a person was involved in a specific crime or other event, the biometric "print" will finger him or her with 99 percent accuracy. Yet context is all important. If that's all the cops have got, it isn't much. Without other information, the odds are still tens of thousands to one that the cops or Border Patrol have the wrong person.

The probabilities change drastically however if the suspect is connected to the crime scene by other evidence. But weighing those probabilities, if they can be weighed, requires a case-by-case approach. Best to beware of some general method.

Turning back to People v. Collins: if the police stopped an interracial couple in a yellow car near the crime scene within a few minutes of the crime, we might be able to come up with a fair probability assessment. It seems certain that statistics were available, or could have been gathered, about hair color, facial hair, car color, hair style, and race. (Presumably the bandits would have had the presence of mind to toss the rifled purse immediately after the robbery.)

So let us grant the probabilities for yellow car at 0.1; woman with ponytail, 0.1; and woman with blonde hair, 0.333. Further, let us replace the "interracial couple in car" event with an event that might be easier to quantify. Instead we estimate the probability of two people of different races being paired. We'd need to know the racial composition of the neighborhood in which they were arrested. Let's suppose it's 60 percent white, 30 percent black, 10 percent other. If we were to check pairs of people in such a neighborhood randomly, the probability of such a pair is 0.6 x 0.3 = 0.18 or 18 percent. Not a big chance, but certainly not negligible either.

Also, we'll replace the two facial hair events with a single event: Man with facial hair, using a 20 percent estimate (obviously, the actual statistic should be easy to obtain from published data or experimentally).

So, the probability that the police stopped the wrong couple near the crime scene shortly after the crime would be 0.1 x 0.1 x 0.333 x 0.18 x 0.2 = about 1.2^-4, or about 1 chance in 8300 of a misidentification. Again, this probability requires that all the facts given to police were correct.

But even here, we must beware the possibility of a fluke. Suppose one of the arrestees had an enemy who used lookalikes to carry out the crime near a point where he knew his adversary would be. Things like that happen. So even in a strong case, the use of probabilities is a dicey proposition.

However, suppose the police picked up the pair an hour later. In that situation, probability of guilt may still be high -- but perhaps that probability is based in part on inadmissible evidence. Possibly the cops know the suspects' modus operandi and other traits and so their profiling made sense to them. But if for some reason the suspects' past behavior is inadmissible, then the profile is open to a strong challenge.

Suppose that a test is done of the witnesses and their averaged error rate is used. Suppose they are remarkably keen observers and their rate of observational error is an amazingly low 1 percent. Let us, for the sake of argument, say that 2 million people live within an hour's drive of the crime scene. How many people are there who could be mistakenly identified as fitting the profile of one of the assailants? One percent of 2 million is 20,000. So, absent other evidence, the probability of wrongful prosecution is in the ballpark of 20,000 to 1.

It's possible that the male or female associate of the innocent suspect's partner is guilty, of course. So one could be an innocent member of a pair while the other member is guilty.

It's possible the crime was by two people who did not normally associate, which again throws off probability analysis. But, let's assume that for some reason the witnesses had reason to believe that the two assailants were well known to each other. We would then look at the number of heterosexual couples among the 2 million. Let's put it at 500,000. Probability is in the vicinity of 5000 to 1 in favor of of wrong identification of the pair. Even supposing 1 in 1000 interracial couples among the 2 million, that's 2000 interracial couples. A one percent error rate turns up roughly 20 couples wrongly identified as suspects.

Things can get complicated here. What about "fluke" couples passing through the area? Any statistics about them would be shaky indeed, tossing all probabilities out the window, even if we were to find two people among the 20 who fit the profile perfectly and went on to multiply the individual probabilities. The astoundingly low probability number may be highly misleading -- because there is no way to know whether the real culprits escaped to San Diego.

If you think that sounds unreasonable, you may be figuring in the notion that police don't arrest suspects at random. But we are only using what is admissible here.

On the other hand, if the profile is exacting enough -- police have enough specific details of which they are confident -- then a probability assessment might work. However, these specific details have to be somehow related to random sampling.
After all, fluke events really happen and are the bane of statistical experiments everywhere. Not all probability distributions conform to the normal curve (bell curve) approximation. Some data sets contain extraordinarily improbable "ouliers." These flukes may be improbable, but they are known to occur for this specified form of information.

Also, not all events belong to a set containing ample statistical information. In such cases, an event may intuitively seem wonderfully unlikely, but the data are insufficient to do a statistical analysis. For example, the probability that three World Trade Center buildings -- designed to withstand very high stresses -- would collapse on the same day intuitively seems unlikely. In fact, if we only consider fire as the cause of collapse, we can gather all recorded cases of U.S. skyscraper collapses and all recorded cases of U.S. skyscraper fires. Suppose that in the 20th Century, there were 2,500 skyscraper fires in the United States. Prior to 9/11 essentially none collapsed from top to bottom as a result of fire. So the probability that three trade center buildings would collapse as a result of fire is 2,500^-3
or one chance in 156 billion.

Government scientists escape this harsh number by saying that the buildings collapsed as a result of a combination of structural damage and fire. Since few steel frame buildings have caught fire after being struck by aircraft, the collapses can be considered as flukes and proposed probabilities discounted.

Nevertheless, the NIST found specifically that fire caused the principle structural damage, and not the jet impacts. The buildings were well designed to absorb jet impact stresses, and did so, the NIST found. That leaves fire as the principle cause. So if we ignore the cause of the fires and only look at data concerning fires, regardless of cause, we are back to odds of billions to one in favor of demolition by explosives.

Is this fair? Well, we must separate the proposed causes. If the impacts did not directly contribute significantly to the collapses, as the federal studies indicate (at least for the twin towers), then jet impact is immaterial as a cause and the issue is fire as a cause of collapse. Causes of the fires are ignored. Still, one might claim that fire cause could be a confounding factor, introducing bias into the result. Yet, I suspect such a reservation is without merit.

Another point, however, is that the design of the twin towers was novel, meaning that they might justly be excluded from a set of data about skyscrapers. However, the NIST found that the towers handled the jet impacts well; still, there is a possibility the buildings were well-designed in one respect but poorly designed to withstand fire. Again, the NIST can use the disclaimer of fluke events by saying that there was no experience with fireproofing (reputedly) blown off steel supports prior to 9/11.

On Hilbert's sixth problem

2007-06-26T11:50:00.000-07:00

The world of null-H post is the next post down.

There is no consensus on whether Hilbert's sixth problem: Can physics be axiomatized? has been answered.

From Wikipedia, we have this statement attributed to Hilbert:

6. Mathematical treatment of the axioms of physics. The investigations of the foundations of geometry suggest the problem: To treat in the same manner, by means of axioms, those physical sciences in which today mathematics plays an important part; in the first rank are the theory of probabilities and mechanics.

Hilbert proposed his problems near the dawn of the Planck revolution, while the debate was raging about statistical methods and entropy, and the atomic hypothesis. It would be another five years before Einstein conclusively proved the existence of atoms.

It would be another year before Russell discovered the set of all sets paradox, which is similar to Cantor's power set paradox. Though Cantor uncovered this paradox, or perhaps theorem, in the late 1890s, I am uncertain how cognizant of it Hilbert was.

Interestingly, by the 1920s, Zermelo, Fraenkel and Skolem had axiomatized set theory, specifically forbidding that a set could be an element of itself and hence getting rid of the annoying self-referencing issues that so challenged Russell and Whitehead. But, in the early 1930s, along came Goedel and proved that ZF set theory was either inconsistent or incomplete. His proof actually used Russell's Principia Mathematica as a basis, but generalizes to apply to all but very limited mathematical systems of deduction. Since mathematical systems can be defined in terms of ZF, it follows that ZF must contain some theorems that cannot be tracked back to axioms. So the attempt to axiomatize ZF didn't completely succeed.

In turn, it would seem that Goedel, who began his career as a physicist, had knocked the wind out of Problem 6. Of course, many physicists have not accepted this point, arguing that Goedel's incompleteness theorem applies to only one essentially trivial matter.

In a previous essay, I have discussed the impossibility of modeling the universe as a Turing machine. If that argument is correct, then it would seem that Hilbert's sixth problem has been answered. But I propose here to skip the Turing machine concept and use another idea.

Conceptually, if a number is computable, a Turing machine can compute it. Then again Church's lamda calculus, a recursive method, also allegedly could compute any computable. So are the general Turing machine and the lamda calculus equivalent? Church's thesis conjectures that they are, implying that it is unknown whether either misses some computables (rationals or rational approximations to irrationals).

But Boolean algebra is the real-world venue used by computer scientists. If an output can't be computed with a Boolean system, no one will bother with it. So it seems appropriate to define an algorithm as anything that can be modeled by an mxn truth table and its corresponding Boolean statement.

The truth table has a Boolean statement where each element is above the relevant column. So a sequence of truth tables can be redrawn as a single truth table under a statement combined from the sub-statements. If a sequence of truth tables branches into parallel sequences, the parallel sequences can be placed consecutively and recombined with an appropriate connective.

One may ask about more than one simultaneous output value. We regard this as a single output set with n output elements.

So then, if something is computable, we expect that there is some finite mxn truth table and corresponding Boolean statement. Now we already know that Goedel has proved that, for any sufficiently rich system, there is a Boolean statement that is true, but NOT provably so. That is, the statement is constructible using lawful combinations of Boolean symbols, but the statement cannot be derived from axioms without extension of the axioms, which in turn implies another statement that cannot be derived from the extended axioms, ad infinitum.

Hence, not every truth table, and not every algorithm, can be reduced to axioms. That is, there must always be an algorithm or truth table that shows that a "scientific" system of deduction is always either inconsistent or incomplete.

Now suppose we ignore that point and assume that human minds are able to model the universe as an algorithm, perhaps as some mathematico-logical theory; i.e., a group of "cause-effect" logic gates, or specifically, as some mxn truth table. Obviously, we have to account for quantum uncertainty. Yet, suppose we can do that and also suppose that the truth table need only work with rational numbers, perhaps on grounds that continuous phenomena are a convenient fiction and that the universe operates in quantum spurts.

Yet there is another proof of incompleteness. The algorithm, or its associated truth table, is an output value of the universe -- though some might argue that the algorithm is a Platonic idea that one discovers rather than constructs. Still, once scientists arrive at this table, we must agree that the laws of mechanics supposedly were at work so that the thoughts and actions of these scientists were part of a massively complex system of logic gate equivalents.

So then the n-character, grammatically correct Boolean statement for the universe must have itself as an output value. Now, we can regard this statement as a unique number by simply assigning integer values to each element of the set of Boolean symbols. The integers then follow a specific order, yielding a corresponding integer.
(The number of symbols n may be regarded as corresponding to some finite time interval.)

Now then, supposing the cosmos is a machine governed by the cosmic program, the cosmic number should be computable by this machine (again the scientists involved acting as relays, logic gates and so forth). However, the machine needs to be instructed to compute this number. So the machine must compute the basis of the "choice." So it must have a program to compute the program that selects which Boolean statement to use, which in turn implies another such program, ad infinitum.

In fact, there are two related issues here: the Boolean algebra used to represent the cosmic physical system requires a set of axioms, such as Hutchinson's postulates, in order to be of service. But how does the program decide which axioms it needs for itself? Similarly, the specific Boolean statement requires its own set of axioms. Again, how does the program decide on the proper axioms?

So then, the cosmos cannot be fully modeled according to normal scientific logic -- though one can use such logic to find intersections of sets of "events." Then one is left to wonder whether a different system of representation might also be valid, though the systems might not be fully equivalent.

At any rate, the verdict is clear: what is normally regarded as the discipline of physics cannot be axiomatized without resort to infinite regression.

*************
So, we now face the possibility that two scientific systems of representation may each be correct and yet not equivalent.

To illustrate this idea, consider the base 10 and base 2 number systems. There are some non-integer rationals in base 10 that cannot be expressed in base 2, although approximation can be made as close as we like. These two systems of representation of rationals are not equivalent.

(Cantor's algorithm to compute all the rationals uses the base 10 system. However, he did not show that all base n rationals appear in the base 10 system.)

The world of null-H

2007-06-25T07:27:00.000-07:00

Some thoughts on data compression, implicate order and entropy (H):

Consider a set of automated machines. We represent each machine as a sequence of logic gates. Each gate is assigned an integer. In that case each machine can be represented as a unique number, to wit its sequence of logic gate numbers. If subroutines are done in parallel, we can still find some way to express the circuit as a single unique number, of course.

In fact, while we're at it, let us define an algorithm as any procedure that can be modeled by a truth table. Each table is a constructable mxn matrix which clearly is unique and hence can be written as a sequence of 0's and 1's read off row by row with a precursor number indicating dimensions. In that case, the table has a bit value of more than nxm. On the other hand, each machine's table is unique and can be assigned an index number, which may have a considerably lower bit value than nxm.

Now suppose, for convenience, we choose 10 machines with unique truth table numbers that happen to be n bits in length. We then number these machines from 1 to 10.

Now, when we send a message about one of the machines to someone who has the list of machines and reference numbers stored, the message can be compressed as a number between 1 and 10 (speaking in base-10 for convenience). So the information value of, say 7, is far lower than that for n=25 base-2 digits. Suppose that it is equally probable that any machine description will be sent. In that case the probability, in base 2, is 2^-25, and the information value is 25 bits.

However, if one transmits the long-form description, it is no better than transmitting the three-bit representation of 7 (111). Clearly the information value of 3 bits for 7 is far lower than the 25 bits for the long description.

Of course Shannon's memoryless channel is a convenient fiction, allowing a partial desription which is often useful (he did some pioneering work on channels with memory also, but I haven't seen it). These days, few channels are memoryless, since almost every computer system comes with look-up functions.

So what we have is data compression. The 25 bits have been compressed into some low number of bits. But, we don't have information compression, or do we we?

If one transmits the long form message to the receiver, that information is no more useful to the receiver than the information in the abbreviated form. I_implicit will do. I_explicit has no additional surprisal value to the receiver.

So I_explicit - I_implicit might be considered the difference between the stored information value and the transmitted information value. But, once the system is up and running, this stored information is useless. It is dead information -- unless someone needs to examine the inner workings of the machine and needs to look it up. Otherwise the persons on each end can talk in compressed form about Machine X and Machine Y all the time without ever referring to the machine details. So I_e - I_i = -I_i under those conditions.

So stored information is dead information for much of the time. It has zero value unless resurrected by someone with an incomplete memory of the long integer string.

Now we have not bothered with the issue of the average information (entropy) of the characters, which here is a minor point. But clearly the entropy of the messaging system increases with compression. Information is "lost" to the storage unit (memory).

However, if someone consults the memory unit, stored information is recovered and the entropy declines.

The point is that entropy in this sense seems to require an observer. Information doesn't really have a single objective value.

Yes, but perhaps this doesn't apply to thermodynamics, you say. The entropy of the universe always declines. Turned around, that statement really means that the most probable events will usually occur and the least probable usually won't. So many scientists seek to redefine sets of "events" in order to discover more intersections. They seek to reduce the number of necessary sets to a minimum.

Note that each set might be treated as a machine with its set language elements denoted by numbers. In that case sets with long integer numbers can be represented by short numbers. In that case, again, entropy seems to be observer-dependent.

Of course one can still argue that the probability of the memory unit remaining intact decreases with time. Now we enter into the self-referencing arena -- as in Russell's set of all sets -- in that we can point out that the design of the memory unit may well require another look-up system, again implying that information and entropy are observer-dependent, not sometimes but always.

Consider a quantum experiment such as a double-slit experiment one photon at a time.
The emitted photon will land in a region of the interference pattern with a specific probability derived from the square of the photon's wave amplitude.

If we regard the signal as corresponding to the coordinates of the detected photon, then the received signal carries an information value equal to -log(p), where p is the probability for those coordinates. (I am ignoring the unneeded term "qubit" here.)

In that case, the entropy of the system is found by -p₁log(p₁) +...+ -p_nlog(p_n).

So the entropy corresponds to the wave function and the information corresponds to the collapse of the wave function -- and we see that information is observer-dependent. The observer has increased the information and decreased the general entropy.

On the other hand, information is a fleeting thing in both the classical and quantum arenas. "Shortly" after the information is received, it loses its surprisal value and dies -- until a new observer obtains it ("new observer" could mean the same observer who forgets the information).

The Kalin cipher

2007-06-24T17:21:00.000-07:00

Note: The Kalin cipher described below can of course be used in tandem with a public key system, or it can be done by hand with calculators. An appropriate software program for doing the specified operations would be helpful.

The idea of the cipher is to counteract frequency analysis via matrix methods.

Choose a message of n characters and divide n by some integer square. The remainder can be padded out with dummy numbers.

Arrange the message into mxm matrices, as shown,


H I H x
O W A y
R E U z

We then augment the matrix with a fourth column. This column is the first key, which is a random number with no zeros. A second 4x3 matrix is filled with random integers. This is the second key. The keys needn't be random. Easily remembered combinations might do for some types of work because one would have to know the cipher method in order to use guessed keys.

We then matrix multiply the two as MK and put MK into row canonical form.
This results in the nine numbers in M being reduced to three in [I|b], where b is the final column.


8  9  8 x   7 9 2      1 0 0 a
8  15 1 y   5 5 4      0 1 0 b
18 5 21 z   3 2 3  =   0 0 1 c
            5 2 3

where (a, b, c) is a set of three rationals.

The keys can be supplied by a pseudorandom number generator in step with a decoder program. A key can vary with each message matrix or remain constant for a period, depending on how complex one wishes to get. But, as said, if one is conducting an operation where it is advantageous for people to remember keys, this method might prove useful.

By the way, if a row of the original message matrix repeats or if one row is a multiple of another, a dummy character is inserted to make sure no row is a multiple of another so that we can obtain the canonical form. Likewise, the key matrix has no repeating rows.

In fact, on occasion other square matrices will not reduce to the I form. A case in point:


a+b  b+c  a+b
a    b    c
1    1    1

The determinant of this matrix is 0, meaning the I form can't be obtained.
In general, our software program should check the determinant of the draft message matrix to make sure it is non-zero. If so, a dummy letter should be inserted, or two inconsequential characters should be swapped as long as the meaning isn't lost.

But, if the program doesn't check the determinant it will give a null result for the compression attempt and hence would be instructed to vary the message slightly.

Notice that it is not necessary to transmit I. So the message is condensed into three numbers, tending to defeat frequency analysis.

Here is a simple example, where for my convenience, I have used the smallest square matrix:

We encrypt the word "abba" (Hebrew for "father") thus:


1 2 3
2 1 1

where the first two columns represent "abba" and the third column is the first key.
We now matrix multiply this 2x3 matrix by a 3x2 matrix which contains the second key.


1 2 3  x  2 1  = 10  16
2 1 1     1 3    7   8
          2 3

We then apply key 1 (or another key if we like) and reduce to canonical form:


10  16  3
7   8   1

which is uniquely represented by


1  0  -1/4
0  1  11/32

By reversing the operations on (-1/4, 11/32), we can recover the word "abba." If we encode some other four characters, we can similarly reduce them to two numbers. Make a new matrix


-1/4  x 3
11/32 y 1

which we can fold into another two number set (u, v).

We see here an added countermeasure against frequency analysis, provided the message is long enough: gather the condensed column vectors into new matrices.

We can do this repeatedly. All we need do is divide by, in this case 4, and recombine. If we have 4²ⁿ matrices to begin with, it is possible to transmit the entire message as a set of two numbers, though usually more than one condensed set would be necessary. Of course we can always pad the message so that we have a block of x²ⁿ. Still, the numbers tend to grow as the operations are done and may become unwieldy after too many condensations. On the other hand, frequency analysis faces a tough obstacle, I would guess.

So a third key is the number of enfoldments. Three enfoldments would mean three unfoldments. Suppose we use a 4x4 system on 64 characters with three enfoldments.
We write this on 16 matrices. After the transform, we throw away the I matrices and gather the remaining columns sequentially into a set of 4 matrices. We transform again and are left with a single column of four numbers. So if the adversary doesn't know the number of enfoldments, he must try them all, assuming he knows the method. Of course that number may be varied by some automated procedure linked to a public key system.

Just as it is always possible to put an nxn matrix without a zero determinant [and containing no zeros] into canonical form, it is always possible to recover the original matrix from that form. The methods of linear algebra are used.

Decryption is also hampered by the fact that in matrix multiplication AB does not usually equal BA.

The use of canonical form and the "refolding" of the matrices is what makes the Kalin cipher unique, I suppose. When I say unique, I mean that the Kalin cipher has not appeared in the various popular accounts of ciphers.

An additional possibility: when an nxn matrix is folded into an n column vector, we might use some "dummy" numbers to form a different dimension matrix. For example, suppose we end up with a 3-entry column vector. We add a dummy character to that string and form a 2x2 matrix, which can then be compressed into a 2-entry column vector. Of course, the receiver program would have to know that a 2-vector is compressed from a 3-vector. Also the key for the 2x2 matrix is so small that a decryption program could easily try all combinations.

However, if a 100x100 matrix folds into a 10-entry column vector, six dummy characters can be added and a 4x4 matrix can be constructed, leaving a 4-vector, which can again be folded into a 2-vector. All sorts of such systems can be devised.

Additionally, a "comma code" can be used to string the vectors together into one number. The decipherment program would read this bit string to mean a space between vector entries.

Clearly the method of using bases or representations as keys and then transforming offers all sorts of complexities -- perhaps not all useful -- but the emphasis on matrix condensation seems to offer a practical antidote to frequency analysis.

BTW, I have not bothered to decimalize the rational fractions. But presumably one would convert to the decimal equivalent in order to avoid drawing attention to the likelihood that the numbers represent canonical row vectors. And, of course, if one is using base 2, one would convert each rational to a linear digit string. Not all decimal fractions can be converted exactly into binary. However, supposing enough bits are used, the unfolded (deciphered) number will, with high probability, be very close to the correct integer representing the character.

Thumbnail of NIST's 9/11 scenario

2007-06-02T07:37:00.000-07:00

Here's a thumbnail of the scenario used by the National Institute of Standards and Technology to justify its computer modeling:

When the plane collided with a tower, a spray of debris "sandblasted" the core columns, stripping them of fireproofing. Some fireproofing may have come off because of the vibrations on impact.

Some jet fuel immediately burst into flame, in a fireball seen from the street, with some of the fiery fuel dropping down the core elevator shaft and spilling out onto random floors, setting a few afire.

Jet fuel, whether fueling an explosion or an ordinary fire, cannot produce enough energy to critically weaken naked core columns in the time alloted, the NIST found.
Hence, the NIST presumed that fires were accelerated by the jet fuel but got hot enough from office materials and furnishings, which the NIST variously put at five pounds per square foot or four pounds per square foot on average.

Though the heat energy actually detected via infrared analysis and visual inspection of the photographic record was far too low to have caused "global collapse," the NIST conjectured that the fires were hotter and more intense closer to the core.

The NIST's mechanism for collapse required sufficiently energetic fires to send heat to the ceiling -- which the NIST modeled as retaining its fireproofing -- and that heat was then convected along the ceiling and sideways into the load-bearing columns and connectors.

Had the fireproofing also come off the ceiling, the heat would have dissipated upward through the floor slab and, the NIST's computer models showed, the building would have stood.

Though there were fires on lower floors, the columns would have retained most of their fireproofing, and so critical weakening wasn't possible. No witnesses reported fireproofing off bare columns on lower floors.

The NIST said collapse was initiated by the shortening of core columns -- whether from buckling or from severing. Presumably this shortening would have occurred on the impact/fire floors, since that's where the fireproofing would have come off.

The NIST realized that blown-out windows supplied oxygen to keep the fire going, but also permitted much heat to dissipate out (with the smoke), making less available for the "blowtorch" scenario.

The NIST said that the impact zone fires seemed to progress around a floor in fits and starts until it had come all the way around the core.

Here are some concerns:

In Tower 2, which fell first, the observed fire energy was very low, the NIST agreed. The NIST attributed Tower 2's fall to structural damage from plane impact with some input from the fires. The plane that struck Tower 2 clipped the corner, meaning the collision force with the core would have been lower than in Tower 1, not more. Yet it is the core that supported the weight of the structure, not the exterior.

Tower 2's top block collapsed at a 23 degree angle. What this means is that Tower 2 had much less "hammer-down" force available for the lower part of the building than some have suggested.

On the other hand, Tower 1's top block came essentially straight down. What this implies is that the core columns on the opposite side of the elevator shaft from the plane were damaged roughly equally with those on the impact side. Otherwise, the top would have tilted over, as if on a hinge. Hence, one is forced by the NIST to accept the idea that a lot of fireproofing was shaken off the opposed columns, but not off the ceiling.

Also, in order for the fires -- which dimmed and flared as oxygen lapsed or became available when the heated air blew out windows -- to damage the core columns in a manner that initiates global collapse, not only must a specific percentage of the 47 columns be critically damaged, but the critical damage must be roughly evenly spaced.
That is, if a set of closely spaced columns lost strength, the building is less likely to give way, since the load is, by design, transferred to the remaining columns.

In the case of Tower 1, the damage to the core columns must be equivalent on the opposing side of the shaft, implying that the "blowtorch" acted symmetrically with respect to energy.

The NIST, in order to get the simulation to work, must have required that on each floor the fire maintain a specific rate of progress and a specific average ceiling energy. That is, if the fire moved through the debris too slowly, it would burn out or simply fail to do enough damage to other columns. But if it moved too rapidly, the circling fire would fail to bring enough heat to bear on any particular column to bring about critical weakening.

Though massive blasts were witnessed from outside just prior to collapse, the NIST felt obliged to discount these explosions as immediate collapse causes -- because the jet fuel simply wouldn't provide enough energy to do critical structural damage.

The case of the missing energy

2007-04-17T14:15:00.000-07:00

Just a note to amplify a previous post which takes a look at the energy deficit problem for the twin towers.

Correction (April 26, 2007): A mass estimate has been revised to 7 x 10⁸ kilograms per building. This is quite a trivial matter, since the numerical mass is irrelevant. It is the energy ratio that is important.

We may regard the energy associated with the buoyant force as the binding energy of the lower structure of the 417-floor WTC2. Most of this energy went into the construction such that the structure could bear the load above. We could think of this energy as internal energy.

That is, if the entire structure collapses, how much energy should be released?
We feel safe to say that the energy must be at least as much as is required to raise a block to a specified height. That is, it must be at least mgy for a specific block and height.

Though there may be some justification for a discrete summation, which I used in a previous version of this post, I have decided that a routine integral suits our purposes nicely. So here goes.

The mass supported at some height y we estimate as roughly

M - (y/H)M = M(1-y/H)

where H is the height of the building.

The potential energy specific to that height is then

gM(1-y/H)y = gM(y - y²/H)

So, for the sum of all potential energies between ground level and y we have
(using S for the integral sign)

gM S y dy - 1/H S y² dy

-- for the interval between height 0 and n, the story at which the upper block fell --

= gM(y²/2 - y³/(3H))

Based on a stated 770,000 tons of steel per building, we estimate building mass at 7x10⁸kg. For WTC2, we put n at 81 and H at 417 meters, with about 3.79 meters between floors.

Plugging in those numbers, we get an internal energy of at least

1.65x10¹⁴ Joules

However, the kinetic energy from the top block's crash is given by

1/2mv² = mgy = 0.25 x 7 x 10⁸ x 9.8 x 3.79 = 6.5 x 10⁹ J, which is five orders of magnitude below the opposing internal energy.

For WTC1, we use H = 420 and n = 92, with 3.82 meters between floors.
Plugging in the numbers, we obtain

1.873 x 10¹⁴ in internal energy for the lower structure, versus an impact energy of 6.5 x 10⁹.

Now it is conceivable that a small amount of energy can bring about the release of a much larger amount of energy -- if it is well positioned. However, the binding energy of the lower structure that concerns us is all vectored so as to resist gravitational collapse. Hence, it is plausible for such a small amount of energy to topple a tower -- if it is released near the base of the structure.

The official idea that there was enough energy to hammer the lower structure down to near-ground level is simply not tenable.

Addendum: I took a look at the kinetic energies provided by the impacts of the jets and found that they were miniscule with respect to the "buoyant force energy" of each building. The NIST was somewhat ambivalent about its view of the jet impact damage.

G whiz

2007-03-19T11:41:00.000-07:00

The conjecture here is that g is affected by a difference in the shape of the earth's core and its sea-level surface.

One problem with my post below is that the polar radius is wrong. Reference books are not always reliable about such things. Another problem is that, at least once, I used a wrong value for G, indicating that one's memory is also not always terribly reliable.

However, these wrong values don't tell the whole story about what is wrong.

Some have assumed that g differs from that expected of a sphere because the earth is non-spherical. But shape is unlikely to be the only issue.

The barycenter (center of mass or gravity) of an an object might be defined as the point where all internal gravitational forces cancel. That is, if we make wedges of equal mass with sides intersecting the barycenter, the gravitational force of each wedge cancels in pairs in a circle around the barycenter and also at the barycenter.
For an object of uniform mass, the barycenter is the centroid of the volume.

Hence, as long as we know the linear distance to the barycenter, we can determine g at the geoid (surface of the object) -- that is, as long as the object has no concavities in the surface or projections (i.e, as long as a tangent line at a perimeter point does not intersect another perimeter point without also intersecting the interior). A depression in the surface means that g at the bottom of the concavity will be reduced by the y component of F_g coming from the higher walls and, similarly for the surface at the base of a projection.

However, the figure of the earth is very close to an ellipsoid, though reportedly it is somewhat pear-shaped, which I am guessing means that the semimajor or semiminor axis of one ellipsoid is tacked on to that of another ellipsoid; i.e., two halves of two different ellipsoids, which share one equal axis, are pasted together. If so, we would still have the situation that the perimeter is effectively a curve where a line never intersects two tangent points without intersecting the ellipsoid's interior. Hence, we would still be able to calculate g by angle (and, to be fussy, by altitude above sea level).

Anyway, to find the distance to the geoid for an ellipsoid, given the angle, we have
r = (cos²/a² + sin²/b²)^-0.5

Now the value of g at latitude 45.5 degrees has been set at 9.80665.

So we plug in the following values:

Polar radius: 6357000 meters
Equatorial radius: 6378000 m
Earth mass: 5.9736 x 10²⁴
G: 6.67259 x 10^-11

Assuming confidence about the earth's mass, at 45.5 degrees, the radius of the earth's ellipsoid is 6367.29 km.
But, setting g = 9.80665, the accepted value for that latitude, r should be 6375.36, meaning that it is 8.07 km shorter than can be accounted for by g = GM_earth/r².

Yes, it may be that the earth is a bit lumpy at the geoid, but another influence may be at work: the earth's interior is not a unform body. In fact, it is believed that the mostly iron core (with a radius about the same as the moon's) is rotating separately from the remainder of the planet.

If we suppose that the core's figure has far less eccentricity than that of the geoid, we can see that the mass distribution will vary by angle, with the highest density, on average, at the poles and the least at the equator. That is, g would decrease more rapidly from pole to equator than would be so for an ellipsoid of uniform mass.

Well, I suppose we might also like to calculate the moon's influence, which should cause the effective g to be less when the moon is in the sky and more when it is behind the earth from the observer. However, I am sure that the official guardians of g have taken the moon's effect into account. In general, I would expect that the effect would on average cancel out, but then again, its effect on local g may well depend on the moon's orbit with respect to latitude.

g whiz

2007-03-17T07:40:00.000-07:00

I am somewhat curious as to why my method of determining local g seems not to come up with good values. I suppose it has to do with irregularities in distribution of the earth's mass and possibly with problems of measurement of big G and the earth's shape.

For example, the standard value of g is given as 9.80665 m/s², taken at sea level at latitude 45.5^o.

Here's what I get:

The expression for an ellipse (the earth's shape) is

(x/a)² + (y/b)² = 1,

where a and b are the semimajor and semiminor axes. Hence, to determine the radius by angle, we have

r = [((cos K)/a)² + ((sinK)/b)² ]^-0.5.

Letting the polar radius = 6364630 meters and the equatorial radius = 6378000, and setting earth mass at 5.98(10)²⁴ kg and G at 6.725(10^-11), I get g = 9.907 at sea level.

Of course, this seemingly obvious formula is not the one used. For details on the actual method of calculation, see Wikipedia article on "standard gravity."
Using somewhat more precise values of G and M, here are some other values of g to three decimal places (including altitude of the U.S. cities listed):

Nashville TN: 9.891
Knoxville TN 9.890
Albany NY: 9.896
New York NY: 9.894
London: 9.902
Jerusalem: 9.873
North or South Pole: 9.918
Equator: 9.877

I suppose the real problem is that the earth isn't a perfect ellipsoid and that its mass is not quite unformly distributed. Dunno.

Energy sums for the twin towers

2007-03-02T14:56:00.000-08:00

Are roof-to-ground collapses plausible?

WTC1
Height: 420 meters

Mean distance per floor: 3.82m

Collapse began at: floor 94

Mass of top block:
0.145M or less, where M is the mass of the entire building (we have neglected the mass of the airliner)

Energy required to keep top block in place:
mgy = 0.145M(9.8)(359.08)meters = (510.25M)Joules

Energy inherent in fail-safe design to keep top block in place:
2mgy or more = (1020.5M)J or more

Energy converted to entanglement or damage energy after the top block falls one story onto the bottom block:
Less than 1/2mv² = 0.5(0.145M)(8.65)² = (5.43M)J

This represents less than 1 percent of the normal force energy of (510.25M)J.

Remaining potential energy in top block at time of collision:
mgy - 1/2mv² = (504.82M)J.

Considering that such a small amount of energy was available to inflict structural damage, it seems problematic that the damage energy was not dissipated rapidly near the top of the underlying block, implying a collapse of no more than a few floors.

Yes, there remained (504.82M)J that could be converted into the kinetic energy of a fall to ground level, but that was counterposed by the normal force energy. Only if the damage energy resulted in large-scale and swift dissipation of the normal force could the observed collapse have occurred. But the amount of damage energy seems inconsistent with that result.

WTC2
Height: 417m

Mean distance per floor: 3.79m

Collapse began at: floor 82

Mass of top block: 0.25M or less (again we neglect the airliner mass)

Energy required to keep top block in place:
mgy = 0.25M(9.8)(310.78) = (761.4M)J

Energy inherent in fail-safe design:
at least 2mgy = (1522.8M)J

Energy converted into entanglement or damage energy after the top block falls one story:
Less than 1/2mv² = 0.5(0.25M)(8.62)² = (9.29M)J

This represents 1.2 percent of the underlying structure's minimal normal force energy.

Remaining potential energy in top block at time of collision:
mgy - 1/2mv² = (752.11M)J

Again, it seems problematic that the damage energy wasn't rapidly dissipated near the top of the underlying block, implying collapse of no more than a few floors. In other words, the potential energy and the normal force energy are counterposed prior to impact. So, for the observed collapse to occur, the damage energy must have inflicted swift and largescale dissipation of the normal force, a result that seems inconsistent with the low amount of damage energy available.

The NIST doesn't really start the collapses as described but says "shortened" core columns dragged the floors inward, precipitating full collapse. However, once collapse is under way, a top-down scenario must ensue. The question is then, at what floor does gravitational collapse begin?

But, this is a non-issue since the inequality 1/2mv² greater than or equal to mgy requires, in our case, y less than or equal to about 3.8 meters.
That is, collapse would have had to have begun near the foundation (which would imply explosives). On the other hand, if y greatly exceeds 3.8m, then the quantity of normal force energy overwhelms the quantity of damage energy.

Drat!

2007-03-02T10:05:00.000-08:00

I hate being an idiot. Intuitively I feel that there should be an energy calculation means of getting at WTC fall times, but, I can't put my finger on it.

I've had to scrap both my spring model and my most recent energy calculation idea.

However, I still feel good about my earlier post titled 9/11 collapse issues which gives a balance of forces argument about fall times.

Sorry to bother anyone.

Michelson-Morley over lightly

2007-02-12T07:33:00.000-08:00

No grand revelations in this post. I'm just setting the record straight as to how I understand the Michelson-Morley experiment.

It is often casually said that Michelson and Morley established that the velocity of light is a constant.

This isn't quite correct. Their interferometer experiment tended to demonstrate that c was the maximum possible velocity in the ether, which, to be sure, was quite a shocking discovery.

Basically, the experiment checked a beam of reflected light crossing the presumed ether wind (relative to the moving earth and interferometer) against a beam traveling into or away from the wind. So they sought a velocity magnitude that did not equal c2^1/2, the magnitude for no ether flow. This difference would have been revealed by a difference in the interference pattern. That is, light crossing the ether wind would be reflected from a different part of the mirror than light going with the wind. This means that the interference pattern for a non-right angle of reflection will differ from the pattern for a right angle of reflection.

So they were testing for galilean velocity addition, which applies to a mechanical wave crossing a moving medium.

Another type of velocity addition is Doppler velocity addition.

So let us call v the constant of propagation in the medium, which doesn't change, and u the velocity of the observer or the source.

For galilean addition:

v + u = kv

so u = v(k-1)

For nonrelativistic doppler addition:

i. Observer moving toward source

(v + u)/v = kv

so u = v(kv-1)

also: f' = f(v+u)/v

indicating the change in frequency.

j. Source moving toward observer

v/(v-u) = kv

so u = v-(1/k)

also: f' = fv/(v-u)

In a mechanical system the elasticity of the medium emerges when the relative tensions differ, where T' = (f'/s)^1/2, with s being the distance unit.

So the tension of the medium can be summarized by T'_j - T'_i

However, if the galilean vector c2^1/2 doesn't hold, then the medium effectively doesn't exist and one expects that v must be the top velocity in the "ether."

In that case one expects zero tension as deduced from the Doppler effect and we get relativistic Doppler addition, thus:

v_i = v_j

That is

(v+u)(v-u) = v²

or, in the final analysis, u = 0.

When u =/= 0, we have the nonrelativistic doppler effect, of course.

In terms of proper time, relativistic velocity is

v = c² - T_p^1/2c

and obviously v cannot exceed c.

Michelson, Morley and bad science

2007-02-10T12:43:00.000-08:00

Geeze! Please disregard the previous post, which I have yanked. I see that I have been befuddled by too many pop sci explanations of the experiment.
I plan to publish something on the Michelson-Morley experiment soon.

9/11 collapse issues

2007-01-16T08:50:00.000-08:00

Draft 3 includes some editorial changes, but no physical or mathematical changes.

Preface to Draft 2
This revised post includes data on elastic collisions and not only perfectly inelastic collisions.

The reader may wonder whether my simplifications suffice, but I have attempted to ensure that real-world events could only have taken longer.

In some extremely spare models, the collapse times are close enough to recorded times to make the reader wonder whether there is a case against the government story here.

I suggest that these spare scenarios are too favorable to the government. But, then, there's a faint chance that they are not. So what is needed is further investigation of this matter.

The NIST said it could find no evidence of the use of planted explosives and yet ducked a scientific analysis of, or even informed opinion about, fall times. Obviously such an analysis is potential evidence, meaning the government did not want to find evidence of explosives.

Clearly, the public investigations thus far are woefully inadequate.

See 'Scientists clash over 9/11 collapses' at http://911science.blogspot.com

Two crucial issues confront those concerned with the collapses of each of the twin World Trade Center towers: the symmetry of collapse and how long the collapse took.

I. The issue of symmetry
Each World Trade Center tower had a core of 47 load-bearing columns which supported the weight of the floors and walls. Think of a square table with a square central pole for a support. The table is a floor, the pole is the core. Now imagine a double-decker table where the pole passes through a hole in the first deck. Make an n-decker table with n-1 holes. Now attach outer walls to the table tops (floors).
We see that the weight of the structure is borne by the pole.

The 47 core columns, which surround the elevator and stairwell shaft, are lashed together by cross beams.

The floor slabs are attached to the outer walls and core columns with joists that are intended to bear the load of the floor slab and the materials and persons on it (there are 4 slabs per story, as I recall) and a portion of the exterior wall.

Now, back to our n-deck table. Drop a heavy object onto the top deck. What happens?
If the object is dropped dead center onto the center pole, either the pole buckles or it holds. If the object strikes off center it may crash through that section of the table, driving the table fragment ahead of it, until it hits the next deck and repeats the process. Possibly the object and its accreted table fragments may drive a piece of the bottom deck to the ground.

Now if each table deck is composed of four sections, one would expect that a number of table sections would be unaffected by the sequence of collisions.

So it is necessary for the pole to collapse also in order to have the table decks collapse symmetrically.

Now let's modify the pole and say that it is composed of a group of spaced-apart steel rods that are composed of smaller rods strongly bound at endpoints. These composite rods are driven into the ground and braced with criss-cross steel supports.
Let's drop a group of loosely tied heavy objects onto the center of the top deck.
It's possible these objects break their loose bonds and disperse symmetrically, crashing through every deck pretty much symmetrically. But, how does the pole collapse? Let's consider a single rod. That rod can only lose altitude if compressed downward or if wrenched sideways and then knocked loose from braces and the connector to the rod segment below. In the first case, it seems unlikely there would be enough force to compress all rods. For most rods to fall, each rod component would have to be bent sideways and then snapped loose.

So thinking of a single rod again, we envision a table segment driven down by falling debris but hanging onto the rod, which is wrenched sideways and yanked loose from its link so as to join the debris flow. It and all the other rods must behave this way, overcoming the resistance not only of connectors but of the criss-cross brace system.

So it seems likely that sets of rods -- up to 11 at a time -- must be jerked sideways and driven down by falling debris. That is, a deck quadrant's joists must hold for collections of interlocked core columns long enough to pull those columns over.

In our model, we arranged that the objects would fall symmetrically.
But in the case of each trade center tower, the top block fell asymmetrically: tilting off center.

So it becomes rather problematic that no core columns remained standing above a few floors for either tower.

II. The issue of fall times
Though there has been a degree of official cageyness about the collapse times of the two main towers and building 7, no one suggests a collapse time of more than 15 seconds for either tower, and most records give shorter times. The 9/11 commission gave WTC2's fall time as 10 seconds but the National Insitute for Standards and Technology reportedly came up with a fall time of 12 seconds. I reviewed an ABC videotape of WTC2's collapse and found that the onscreen timer yielded between 14 and 15 seconds -- though internet videos cannot be trusted because of issues of data compression and data reconstruction and also because one cannot be sure the video has not been tampered with. For purposes of calculation, then, we have the reported official limit of 12 seconds and a non-authoritative outside limit of 15 seconds.

There has been debate as to whether the towers fell at near the free-fall rate. Critics say that such speeds are consistent with controlled demolition and not with top-down gravitational collapse. If one dropped a rock from the top of a trade center tower, the rock could not take less than 9.22 seconds to hit the ground. This calculation neglects air resistance, which in fact has a measurable effect, but which can be reasonably ignored in a simplified scenario [see footnote A].

So a reputed fall time of about 10 seconds for WTC2 would be startlingly suspicious. However, because of problems with the official records, we cannot be sure of that 10-second figure. (See Trade center collapse times: omissions and disparities at http://www.angelfire.com/ult/znewz1/fallrates.html)

So what fall time should be expected for a trade center tower that collapsed as the government alleges?

Let us think of our model above. In that case, we might imagine, without doing the necessary computerized calculations, that the core supports could be snapped sideways rather quickly and that the gathering momentum of "snowballing" debris would result in a rather speedy fall time.

However, this leaves unresolved the issue of how all core columns came to be leveled. A way out would be to suggest that the top block tilted enough so that a considerable amount of mass fell directly onto the core, causing a massive buckling and collapse, which reached down x number of meters. The block fell, with little obstruction through the x meters, before encountering a relatively secure set of core columns again. Upon impact, the scenario was repeated.

This scenario would then explain how the core could have been symmetrically reduced.

It turns out that we can use a simple trick to see whether fall times in this scenario work with respect to observed fall times.

Here is the trick: the strength of the core along some arbitrary number of meters must be enough to hold the entire weight of the building above. Now we know that the weight is simply w = -gm. In that case, the core length in question must have a static force of AT LEAST f = mg. In fact, the core was designed to stand far more force, which is why we needn't worry about the added weight of the jetliner.

Also, the core increases in strength -- in upward force -- as altitude decreases. This is useful, since it further justifies our little trick.

By way of example, picture a standing structure composed of three one-kilogram bricks, A, B, C, of 1 cubic meter each. The normal force at ground zero is n = 3g. At one meter high, n = 2g. Here n is the reaction force of brick A (and the earth) pushing back against the weight of bricks B and C. At h = 2m, n = g and at 3m, n = 0.
Now how much weight could brick A actually hold? This depends on the strength of the material and, if the brick was carved into, say, an arch, the design of the load bearing structure. So we would have f = x2mg, where x is some real number, presumably greater than 1.

Now if f = xmg, then either the acceleration a = xg or there is a mass M such that M = xm. For purposes of calculation, it is immaterial which we choose. So, as we shall see, the choice M = xm is convenient for our purposes.

The trick is to employ the formulas for collision and use the opposed STATIC forces as our initial values. That is |cmg| is greater than or equal to |-mg|, with c some low constant. Since mass means resistance to acceleration, the m represents either the mass of the supported block or the resistance to that block, as in an appropriately designed core segment resisting gravitational acceleration not only with mass but with load distribution effects.

Perfectly inelastic collision
Supposing that the static supporting force equals the weight (f = w), the formula for the inelastic collision of two blocks, A and B, is

v_a+b = (m_av_a + m_bv_b)/(m_a + m_b).

Realizing that we have m_a = cm_b, we can write, for c = 1,

v_a+b = m_a(v_a + 0)/2m₁

= v_a/2

For f = 2w, we have

v_a+b = v_a/3

and we generally, v_a+b = v_a/k

At this juncture, I'm going to cheat a bit in favor of the government. For one thing, I am essentially calculating how long it might take the core to collapse. I am including the mass of the remainder of the building in our considerations and am permitting it to fall relatively unimpeded, though it can be included as part of the force at a height where the core resists.

I am also going to assume, for purposes of calculation, that once the core column segments buckle and lose strength on impact, their masses are all collected up to the height of impact. Though this can't happen in the real world, a more realistic scenario of distributing the mass down the x meters affected can only reduce impact force at the next resistance level and lengthen fall times.

I estimate that WTC2's collapse began 320 meters high. I then posit the block of 100 remaining meters falling through x meters with little resistance until hitting a height where the resistance force is greater than or equal to the static force (weight) of the initial block plus its accreted floors and other building parts.
I have also decided upon making x an average of "free-fall" meters. In reality, such a scenario would yield an irregular number of meters per "free-fall" stage.

For ease of calculation, I rounded off all numbers at the second decimal place. The round-off error then results in several instances where fall times for particular stages are given as equal when they differ slightly. However, since we are dealing with only 320 meters, the roundoff effect shouldn't affect the results enough to change the implications.

Kinematic free-fall equations yield:

v_impact = (v_o² + 19.6x)^1/2

t = (v_i - v_o)/9.8

For two levels of resistance above ground, I use heights of 320-108, 320-108-106. There are another 106 meters to the ground.

Below is a table for x = 108, 106, 106; k = 2

Stage 1
v_o = 0
v_i = 28
t = 4.69

Stage 2
v_o = 14
v_i = 47.68
t = 3.44

Stage 3
v_o = 23.84
v_i = 51.44
t = 2.82

Stage 4
The final stage models the roof collapsing 100 meters to the ground with negligible resistance
v_o = 51.44
t = 1.68

Total fall time: 12.68 seconds

I omit the remaining tables
__________________________
x = 108, 106, 106, k = 3

Total fall time: 12.7 seconds
__________________________
x = 108, 106, 106, k = 3.5

Total fall time: 13.73 seconds
__________________________
x = 80, k = 2

Total fall time: 13.08 seconds
__________________________
x = 80, k = 3

Total fall time: 14.62 seconds
__________________________
x = 80, k = 3.5

Total fall time: 14.92 seconds
__________________________
x = 40, k = 2

Total fall time: 16.85 seconds
__________________________
x = 40, k = 3

Total fall time: 19.5 seconds

Now these estimates must be regarded as extreme lower bounds because serious sources of resistance have been ignored. Also, assuming symmetrical collapse and leveling of the core, the last calculation is the most reasonable of the set.

Now if we accept NIST's reported time estimate for WTC2, then a new investigation into the cause of collapse is undeniably warranted.

If we accept my non-authoritative 15-second estimate, the NIST theory still has only limited plausibility because their model could only conceivably work only when our parameters are far too generous.

Elastic collision
When one mass is at rest, the elastic collision [see footnote B] formula, with i indicating pre-collision and f post-collision, is

v_1f = v_1i(m₁ - m₂)/(m₁ + m₂)

v_2f = v_1i(2m₁/(m₁ + m₂)
_____________________________________
We now examine x = 80 meters and m₁ = m₂.

In this case the post-collision velocity of block 2 is simply that of block 1 prior to collision. The post-collision velocity of block 1 is 0 (in our free-fall frame of reference).

So for stage one's drop of 80 meters, we get 4.04 seconds and for stages 2 through 4 we get 3(1.67) seconds, and for the final stage when we time the free-fall of the roof to the ground from 100 meters we have 1.57 seconds for a total of 10.62 seconds. This "billiard-ball" scenario is, of course, far too generous to the government.
_______________________________________
With m₁ = m₂ and x = 40, the total collapse time is 13.14 seconds. Again, this scenario seems much too generous.
_______________________________________
With m₁ = m₂ and x = 20, the total time is 17.11 seconds.
_______________________________________

Now suppose we posit m₂ = 2m₁. In that case,

v_1f = v_1i(m - 2m)/(m + 2m) = -2/3v_1i

v_2f = v_1i(2m/4m) = 1/2v_1i

We see that block 2 has the same velocity as the entangled block in the inelastic case where m₂ = 2m₁. As the upward momentum of the recoiling upper block cannot increase the lower block's speed, we know that this model cannot produce total collapse times that are less than those for our inelastic model. [See footnote B]

****************************
Footnote A:
Since the distance is short, I calculated air resistance as kv for the upper block falling through 320 meters of air only. Obviously, this isn't realistic. For one thing, the "block" falls in various pieces, and smaller masses tend to have more air resistance.

But, anyway, for the record:

WTC7 had 770,000 tons of steel. That weight plus, I guessed, 10 percent for other materials and content is 847,000 tons. These figures lead to an estimated weight for the top block of 3.81*10^8 Newtons.

The linear differential equation I used was

v = (mg/k)(1 - e^-kt/m)

For k = 2, 3 and 30, roundoff values are all 79.19 m/s. The free-fall value is 79.18 m/s. So fall time for the bottom of the block rounds off to 8.08s in both cases. The actual difference is about a thousandth of a second.

***************************
Footnote B
The coefficient of restitution e quantifies the elasticity of the system thus:

v_2f - v_1f = -e(v_2i - v_1i

When e = 1, a collision has no inelasticity; when e = 0, a collision is perfectly inelastic. Values of e between 0 and 1 give real-world degrees of elasticity.
However, in our models, setting 0 < e < 1 cannot yield bounds lower than those listed.

Sierpinski, phone home

2006-12-01T12:35:00.000-08:00

Several other posts on 'intelligent design' are on this blog

Randomness and form are acutely discussed in Chaos and Fractals: new frontiers in science by Peitgen, Jurgens and Saupe, Springer-Verlag, 1992. This book, a well-rounded introductory survey, makes the point that simple rules can yield complex results, anticipating Wolfram's chief conclusion in New Kind of Science by about a decade. In fact, the PJS book incorporates some of Wolfram's early findings on cellular automata.

A very interesting result of chaos theory is that rules governing a random walk (the "chaos game"), yield a highly probable overall order. This order very strongly tends to become more precise as the n-step iterative algorithm goes to infinity.

In particular, the chaos game has an attractor called the Sierpinski gasket, a form that shows up in deterministic processes as well, including some Wolfram diagrams.

This result is highly reminiscent of the wave-particle duality feature of a quantum mechanical double slit experiment conducted photon by photon. The constraints (the two slits and whatever those two slits might imply) influence the overall probabilities of where photons land, so that the pattern becomes a diffraction image typical of a wave.

Also, such a result tends to demonstrate that simple constraints on random behavior can yield sets associated with "order," a point of interest to network theorists and those interested in the emergence of orderly (low entropy) systems.

After a sufficient number of iterations, we might apply a runs test to a chaos game's Sierpinski gasket, by reading triangles via some "address" system, that identifies a specific triangle by a route taken from the apex to get there. I think we might linearize this by having an algorithm for formulating addresses. In general, sequences with very large or very small periods are considered very suggestive of nonrandomness. That is ABABABABAB has a low probability of being randomly generated, as does AAAAABBBBB.

By this, we would find that the triangles of a Sierpinski gasket have a periodicity consistent with low probability and low entropy; that is, consistent with order. But, using perhaps a box-count method, the order breaks down at the point level and we obtain a high entropy distribution. On the other hand, a careful assessment would give probable point densities per box and hence the consequent macro-structure. The order emerges as a consequence of the constraints that influence the probability distributions.

[Note that Wolfram, rather than emphasizing relatively high periodicity to focus on order or complexity, puts the stress on lowness of periodicity.]

In the chaos game, the attractor appears to be a consequence of conditional probabilities, but I need to do a bit more study to see exactly how that works.

So suppose a SETI television receiver picked up a transmission that could be broken down as a Sierpinski gasket. That is, the receiver picks up a signal from a source and translates the data onto a digitized screen. Suppose that at time t, the screen contained randomly lit pixels but at time t+k, a Sierpinski gasket began to emerge, which was even more pronounced at time t+(k+j). Now suppose the SETI observer didn't get around to looking at the monitor until t+(k+j). He or she might easily think that a signal had been sent by an intelligent being. Yet, we know that the image could have resulted from random processes constrained by simple natural limits.

Well, I'd like to know whether constraints exist for a random walk that has a Sierpinski carpet as an attractor. That's because an infinitely extended Sierpienski carpet contains every possible linear form (in the sense of topological equivalence). Hence, one might conceivably "discover" any symbolic message in a Sierpinski carpet. However, the fact that every message is implicit in this graph doesn't mean it is obvious. One must still use a winnowing process to find each message.

ADDENDUM: If a chaos game initial point falls on a vertex of the initial triangle, subsequent points always fall on the attractor, the Sierpinski gasket. If an initial point falls outside the initial triangle, each subsequent point draws nearer to the attractor, so that points converge to the attractor. As long as probabilities are equal, the attractor, over time, tends to fill out uniformly; otherwise not.
In the case of a double-slit experiment, the single quantum of energy is never recorded in the area of "destructive interference." So the area of constructive interference behaves exactly like a chaos attractor with the added idea that the initial condition of the double-slit and single quantum of energy is equivalent to starting out on the attractor.

Signal or noise?

2006-11-10T10:56:00.000-08:00

As suggested in the post ET, phone home, it is in principle possible to distinguish noise from a signal, though this may not always be practically feasible. Here we use a simple example to show how nonrandomness might be inferred (though not utterly proved).

Suppose one tosses a fair coin 8 times (or we get finicky and use a quantum measurement device as described previously). What is the probability that exactly four heads will show up?

We simply apply the binomial formula C(n,x)(p^x)(q^n-x), which in this case is set at 70/2^8, for an equivalent percentage of about 27%. The probability is not, you'll notice, 50 percent.

The probability of a specific sequence of heads and tails is 2^(-8), which is less than 1 percent. That's also the probability for 0 heads (which is the specific sequence 8 tails).

Probability for 1 head (as long as we don't care which toss it occurs in) is about 3%, for 2 heads is about 10%, and for 3 heads is about 22%.
As n increases, the probability of exactly n/2 heads decreases. The probability of getting exactly 2 heads in 4 tosses is 37.5%; the probability of exactly 3 heads in 6 tosses is 31.25%.

On the other hand, the ratio of flips to heads tends to approximate 1/2 as n increases, as is reflected by the fact that the case heads occurs n/2 times always carries the highest probability in the set of probabilities for n.

That is, if there are a number of sets of 8 trials, and we guess prior to a trial that exactly 4 heads will come up, we will tend to be right about 27% of the time. If we are right substantially less than 27% of the time, we would suspect a loaded coin.

Yet let us beware! The fact is that some ratio must occur, and n/2 is still most likely. So if n/2 heads occurs over, say 100 tosses, we would not be entitled to suspect nonrandomness -- even though the likelihood of such an outcome is remote.

Note added July 2007: As de Moivre showed, Stirling's formula can be used to cancel out the big numbers leaving (200pi)^0.5/(100pi), which yields a probability of close to 8 percent. For 1000 tosses, the chance of exactly 500 heads is about 2.5 percent; for 1 million tosses, it's about 0.08 percent.

(Suppose we don't have a normal table handy and we lack a statistics calculator. We can still easily arrive at various probabilities using a scientific calculator and Stirling's formula, which is

n! ~ (n/e)ⁿ(2n * pi)^0.5

Let us calculate the probability of exactly 60 heads in 100 tosses. We have a probability of

(100/e)¹⁰⁰(200pi)^0.5/[2¹⁰⁰(60/e)⁶⁰(40/e)⁴⁰2pi(2400)^0.5]

which reduces to

50¹⁰⁰/(40⁴⁰ * 60⁶⁰) * [(200pi)^0.5/(2pi(2400)^0.5]

We simply take logarithms:

x = 100ln50 - (40ln40 + 60ln60) = -2.014

We multiply e^-2.014 by, 0.814, which is the right-hand boldface ratio in brackets above, arriving at 0.01087, or 1.09 per cent.)

Second thought
On the other hand, suppose we do a chi-square test to check the goodness-of-fit of the observed distribution to the binomial distribution. Since the observed value equals the expected value, for x/500 + y/500 (that is x=0 and y=0), then chi-square equals 0.
That is, the observed distribution perfectly fits the binomial curve.
But what is the probability of obtaining a zero value for a chi-square test? (I'll think about this some more.)
Note added July 2007: In these circumstances, this doesn't seem to be a fair question.

Back to the main issues
Suppose a trial of 20 tosses was reported to have had 14 heads. The probability is less than 4% and one would suspect a deterministic force -- though the probabilities alone are insufficient for one to definitively prove nonrandomness.

Similarly, when (let's say digital) messages are transmitted, we can search for nonrandomness by the frequencies of 0 and 1.

But as we see in the ET post, it may become difficult to distinguish deterministic chaos from randomness. However, chaotically deterministic sequences will almost always vary from truly random probabilities as n increases.

Addendum

Sometimes data sets can be discovered to be chaotic, but nonrandom, by appropriate mappings into phase space. That is, if an iterative function converges to a strange attractor -- a pattern of disconnected sets of points within a finite region -- that attractor can be replicated from the data set, even though an ordinary graph looks random.

A slippery subject

2006-11-09T13:57:00.000-08:00

Probability is a slippery subject in more ways than one.

Here's an example (from Martin Sternstein's Statistics Barron's, 1994) of a possibly counterintuitive result:

Suppose that 6 percent of stories of major stock market success stem from illegal insider information passing. In some arbitrary group of seven highly successful stock traders, what would you think would be the chance that only one member of the group is a crook whose success stems from illegal insider activity?
The answer may surprise you.

We apply the binomial formula C(n,x)(p^x)(q^n-x). That is, 7!/6!(0.06)(0.94)^6 = 0.2897. So, the chance only one member is a crook is close to 30 percent!
The chance that at least one member is a crook is 1-(0.94)^7 = 0.351, or about 35 percent. That is, the chance is greater than one in three that one successful member of some group of seven successful stock traders is a crook -- despite the overall low incidence of criminality!

ET, phone home

2006-11-05T14:02:00.000-08:00

This post also concerns the "intelligent design" controversy. See related posts below.

Let's think about the signal detected by SETI researchers in the fictional Contact scenario, an example given by intelligent design proponent William A. Dembski.

The signal from deep space was a sequence of sets of beeps, with each subsequent set holding the number of elements equal to a corresponding prime number. That is, the SETI team had detected a sequence of the first primes under the integer 100.
What are the chances of that? statistician Dembski asks. Obviously quite low. Detection of such a signal points to a sentient mind, he argues. Similarly, detection of a very low-probability pattern at the microbiotic level points to an intelligence behind the pattern, he says.

Suppose we don't consider the fact that the prime sequence is a well-known discrete climbing curve and we wished to know whether it significantly differed from a true random discrete curve. How would we go about detecting a difference?

Obviously, the SETI curve must differ from a fully random curve because it climbs from one value to the next, whereas a fully random curve would be likely to have an averaged out slope close to a horizontal line. So then, we must require that our candidate curve be random within constraints that require that the curve always climb. What would be the constraints? Clearly the constraints are the deterministic part of the curve.

What we find is that any computable climbing curve will do for constraints, with a set of random choices set between two values. A recursively defined curve (including a chaotic output recursive) would also do for a constraint, with the end value being randomly selected between f(x) and f(x+1). Or we can use two computable climbing curves and randomly choose a value that falls between the two curves.

For example, we might require that a value be chosen randomly between 2x and 3x. In that case, with enough values, the slope should tend to approximate [D(2x+3x)]/2, or 2.5.

Now we might check the SETI curve (up to the value received) and see what the average slope is. Even if the curve is actually composed of values of, say, p+7, the curve will map onto a known computable curve, of course. And we would suspect intelligence because we are unfamiliar with such a deterministically chaotic curve in nature.

However, it is conceivable that energy might be continually pumped into some system that results in a deterministically chaotic radio emission that shows ever-increasing numbers of bursts. Hence, we could not claim a designer was behind the emission merely on the basis of the constraints, unless we knew more about them.

Still, most processes in nature are deeply influenced by quantum phenomena, meaning a hefty degree of true randomness is injected into natural radio emissions.
So, if one could show that a radioed pattern was sufficiently deterministic, that would suffice to strongly indicate an intelligence behind the pattern. However, without knowledge of natural curves, we would have a tough time distinguishing between highly deterministic but chaotic climbing curves and truly random curves within constraints.

But, if we kept the constraints quite simple, that decision would influence how we categorize the suspect radio pattern. We would check the suspect slope and see whether its constraints (boundary slopes) are simple. If not, we would suspect full determinism, with any randomness being simple noise during transmission.

Now I don't suppose we expect constraint curves to be very complicated, probably following known climbing curves, such as supernova luminosity. That is, the slope average would conform to known natural phenomena, even if the specific patterns did not. So lacking such a slope, we could feel that we had either uncovered a new natural phenomenon or that we had encountered intelligence.

Pseudorandom thoughts on complexity

2006-11-02T11:09:00.000-08:00

Draft 2

This post supplements the previous post "Does math back 'intelligent design'?"

With respect to the general concept of evolution, or simply change over time, what do we mean by complexity?

Consider Stephen Wolfram's cellular automata graphs. We might think of complexity as a measure of the entropy of the graph, which evolves row by row from an initial rule whereby change occurs only locally, in minimal sets of contiguous cells. Taken in totality, or after some row n, the graphs register different quantities of entropy. That is, "more complex" graphs convey higher average information than "less complex" ones. Some graphs become all black or all white after some row n, corresponding to 0 information after that row. There exists a significant set of graphs that achieve neither maximum nor minimum entropy, of course.

How would we define information in a Wolfram cellular automaton graph? We can use several criteria. A row would have maximum entropy if the probability of the pattern of sequential cell colors is indistinguishable from random coloring. [To be fussy, we might use a double-slit single photon detector to create a random sequence whereby a color chosen for a cell is a function of the number of the quadrant where a photon is detected at time t.]

Similarly for a column.

Obviously, we can consider both column and row. And, we might also consider sets of rows and-or columns that occur as a simple period. Another possibility is to determine whether such sets recur in "smooth curve quasi-periods" such as every n^2. We may also want to know whether such sets zero out at some finite row.

Another consideration is the appearance of "structures" over a two-dimensional region. This effectively means the visual perception of at least one border, whether closed or open. The border can display various levels of fuzziness. A linear feature implies at least one coloration period (cycle) appearing in every mth row or every nth column. The brain, in a Gestalt effect, collates the information in these periods as a "noteworthy structure." Such a structure may be defined geometrically or topologically (with constraints). That is, the periodic behavior may yield a sequence of congruent forms (that proliferate either symmetrically or asymmetrically) or of similar forms (as in "nested structures"), or of a set of forms each of which differs from the next incrementally by interior angle, creating the illusion of morphological change, as in cartoon animation.

At this juncture we should point out that there are only 254 elementary cellular automata. However, the number of CA goes up exponentially with another color or two and when all possible initial conditions are considered.

So what we are describing, with the aid of Wolfram's graphs, is deterministic complexity, which differs from the concept of chaos more on a philosophical plane than a mathematical one.

We see that, depending on criteria chosen, CA graphs, after an evolution of n steps, differ in their maximum entropy and also differ at the infinite limit in their maximum entropy. Each graph is asymptotic toward some entropy quantity. By no means does every graph converge toward maximum entropy as defined by a truly random pattern.

So we may conclude that, as Wolfram argues, simple instructions can yield highly complex fields. The measure of complexity is simply the quantity of information in a a graph or subgraph defined by our basic criteria. And what do we mean in this context by information? If we went through all n steps of the rule and examined the sequence of colors in, for example, row n, the information content would be 0 because we have eliminated the uncertainty.

If, however, we don't examine how row n's sequence was formed, then we can check the probability of such a sequence with the resulting information value. At this point we must beware: Complete aperiodicity of cell colors in row n is NOT identical with maximum entropy of row n. Think of asking a high school student to simulate flipping of a coin by haphazardly writing down 0 or 1 in 100 steps. If one then submits the sequence to an analyst, he or she is very likely to discover that the sequence was not produced randomly because most people avoid typical sub-sequences such as 0 recurring six times consecutively.

So then, true randomness (again, we can use our quantum measuring device), which corresponds to maximum entropy, is very likely to differ significantly from computed chaos. This fact is easily seen if one realizes that the set of aperiodic computable irrational numbers is of a lower cardinality than the set of random digit sequences. Still, it must be said that the foregoing lemma doesn't mean there is always available a practical test to distinguish a pseudorandom sequence from a random sequence.

We might also think of deterministic complexity via curves over standard axes, with any number of orthogonal axes we like. Suppose we have a curve y = x. Because there is no difference between x and y, there is effectively no information in curve f(x). No work is required to determine f(x) from x. The information in y = 2x is low because minimal work (as counted by number of simple steps in the most efficient algorithm known) is required to determine g(x) from x. Somewhat more information is found for values of h(x) = x^2 because the computation is slightly slower.

A curve whose values hold maximum information -- implying the most work to arrive at an arbitrary value -- would be one whereby the best method of determining f(x+k) requires knowledge of the value f(x). Many recursive functions fit this category. In that case, we would say that a computed value whose computational work cannot be reduced from n steps of the recursive function or iterative algorithm holds maximum information (if we don't do the work).

So let's say we have the best-arranged sieve of Eratosthenes to produce the sequence of primes. On an xyz grid, we map this discrete curve z = f(y) over y = x^2, using only integer values of x. Now suppose we perceived this system in some other way. We might conclude that a chaotic system shows some underlying symmetry.

It is also possible to conceive of two maximally difficult functions mapped onto each other. But, there's a catch! There is no overall increase in complexity. That is, if f(x) is at maximum complexity, g(f(x)) cannot be more complex -- though it could conceivably be less so.

This conforms to Wolfram's observation that adding complexity to rules does little to increase the complexity of a CA.

Now what about the idea of "phase transitions" whereby order suddenly emerges from disorder? Various experiments with computer models of nonlinear differential equations seem to affirm such possibilities.

Wolfram's New Kind of Science shows several, as I call them, catastrophic phase transitions, whereby high entropy rapidly follows a "tipping point" as defined by a small number of rows. Obviously one's perspective is important. A (notional) graph with 10^100 iterations could have a "tipping point" composed of millions of rows.

Wolfram points out that minor aymmetries in a high entropy graph up to row n are very likely to amplify incrementally -- though the rate of change (which can be defined in several ways) can be quite rapid -- into a "complex" graph after row n. I estimate that these are low entropy graphs, again bearing in mind the difference between true randomness and deterministic chaos or complexity: the entropies in most cases differ.

What we arrive at is the strong suggestion -- that I have not completely verified -- that a high information content in a particular graph could easily be indicative of a simple local rule and does not necessarily imply an externally imposed design [or substitution by another rule] inserted at some row n.

However, as would be expected, the vast majority of Wolfram's graphs are high-entropy affairs -- no matter what criteria are used -- and this fact conforms to the anthropomorphic observation that the cosmos is en toto a low entropy configuration, in that most sets of the constants of physical law yield dull, lifeless universes.

I should note that New Kind of Science also analyzes the entropy issue, but with a different focus. In his discussion of entropy, Wolfram deploys graphs that are "reversible." That is, the rules are tweaked so that the graph mimics the behavior of reversible physical processes. He says that CA 37R shows that the trend of increasing entropy is not universal because the graph oscillates between higher and lower entropy eternally. However, one must be specific as to what information is being measured. If the entropy of the entire graph up to row n is measured, then the quantity can change with n. But the limiting value as n goes to infinity is a single number. It is true, of course, that this number can differ substantially from the limiting value entropy of another graph.

Also, even though the graphs display entropy, the entropy displayed by physical systems assumes energy conservation. But Wolfram's graphs do not model energy conservation, though I have toyed with ways in which they might.

The discussion above is all about classical models arranged discretely, an approach that appeals to the computer science crowd and to those who argue that quantum physics militates against continuous phenomena. However, I have deliberately avoided deep issues posed by the quantum measurement/interpretation problem that might raise questions as to the adequacy of any scientific theory for apprehending the deepest riddles of existence.

It should be noted that there is a wide range of literature on what the Santa Fe Institute calls "complexity science" and others sometimes call "emergence of order." I have not reviewed much of this material, though I am aware of some of the principle ideas.

A big hope for the spontaneous order faction is network theory, which shows some surprising features as to how orderly systems come about. However, I think that Wolfram graphs suffice to help elucidate important ideas, even though I have not concerned concerned myself here with New Kind of Science's points about networks and cellular automata.

Does math back 'intelligent design'?

2006-11-01T12:22:00.000-08:00

Two of the main arguments favoring "intelligent design" of basic biotic machines:

. Mathematician William A. Dembski (Science and Evidence for Design in the Universe) says that if a pattern is found to have an extraordinarily low probability of random occurrence -- variously 10^(-40) to 10^(-150) -- then it is reasonable to infer design by a conscious mind. He points out that forensics investigators typically employ such a standard, though heuristically.

. Biochemist Stephen C. Meyer (Darwin's Black Box) says that a machine is irreducibly complex if some parts are interdependent. Before discussing intricate biological mechanisms, he cites a mousetrap as a machine composed of interdependent parts that could not reasonably be supposed to fall together randomly.

Meyer is aware of the work of Stuart Kauffman, but dismisses it because Kauffman does not deal with biological specifics. Kauffman's concept of self-organization via autocatalysis however lays the beginnings of a mathematical model demonstrating how systems can evolve toward complexity, including sudden phase transitions from one state -- which we might perceive as "primitive" -- to another state -- which we might perceive as "higher." (Like the word "complexity," the term "self-organization" is sometimes used rather loosely; I hope to write something on this soon.)

Kauffman's thinking reflects the work of Ilya Prigogine who made the reasonable point that systems far from equilibrium might sometimes become more sophisticated before degenerating in accordance with the "law of entropy."

This is not to say that Meyer's examples of "irreducible complexity" -- including cells propelled by the cilium "oar" and the extraordinarily complex basis of blood-clotting -- have been adequately dealt with by the strict materialists who sincerely believe that the human mind is within reach of grasping the essence of how the universe works via the elucidation of some basic rules.

One such scientist is Stephen Wolfram whose New Kind of Science examines "complexity" via iterative cellular automaton graphs. He dreams that the CA concept could lead to such a breakthrough. (But I argue that his hope, unless modified, is vain; see sidebar link on Turing machines.)

Like Kauffman, Wolfram is a renegade on evolution theory and argues that his studies of cellular atomata indicate that constraints -- and specifically the principle of natural selection -- have little impact on development of order or complexity. Complexity, he finds, is a normal outcome of even "simple" sets of instructions, especially when initial conditions are selected at random.

Thus, he is not surprised that complex biological organisms might be a consequence of some simple program. And he makes a convincing case that some forms found in nature, such as fauna pigmentation patterns, are very close to patterns found according to one or another of his cellular automatons.

However, though he discusses n-dimensional automata, the findings are sketchy (the combinatorial complexity is far out of computer range) and so cannot give a three-dimensional example of a complex dynamical system emerging gestalt-like from some simple algorithm.

Nevertheless, Wolfram's basic point is strong: complexity (highly ordered patterns) can emerge from simple rules recursively applied.

Another of his claims, which I have not examined in detail, is that at least one of his CA experiments produced a graph, which, after sufficient iterations, statistically replicated a random graph. That is, when parts of the graph were sampled, the outcome was statistically indistinguishable from a graph generated by computerized randomization. This claim isn't airtight, and analysis of specific cases needs to be done, but it indicates the possibility that some structures are somewhat more probable than a statistical sampling would indicate. However, this possibility is no disproof of Dembski's approach. (By the way, Wolfram implicitly argues that "pseudorandom" functions refer to a specific class of generators that his software Mathematica avoids when generating "random" numbers. Presumably, he thinks his particular CA does not fall into such a "pseudorandom" set, despite its being fully deterministic.)

However, Wolfram also makes a very plausible case (I don't say proof because I have not examined the claim at that level of detail) that his cellular automata can be converted into logic languages, including ones that are sufficiently rich for Godel's incompleteness theorem to apply.

As I understand Godel's proof, he has demonstrated that, if a system is logically consistent, then there is a class of statements that cannot be derived from axioms. He did this through an encipherment system that permits self-referencing and so some have taken his proof to refer only to an irrelevant semantical issue of self-referencing (akin to Russell's paradox). But my take is that the proof says that statements exist that cannot be proved or derived.

So, in that case, if we model a microbiotic machine as a statement in some logic system, we see immediately that it could be a statement of the Godel type, meaning that the statement holds but cannot be derived from any rules specifying the evolution of biological systems. If such a statement indeed were found to be unprovable, then many would be inclined to infer that the machine specified by this unprovable statement must have been designed by a conscious mind. However, such an inference is a philosophical (which does not mean trival) difficulty.

A clarification

2006-10-24T14:28:00.000-07:00

I have added a footnote to my Twin Paradox page (see link in sidebar) pointing out that among those who have correctly understood the paradox are mathematician Jeff Weeks, physicist Richard Wolfson and science writer Stan Gibilisco.

All note that the paradox is resolved only by General Relativity, which Einstein promulgated a decade after his first relativity papers. Curiously, Einstein seems never to have directly conceded that his groundbreaking "special" theory contained a logical contradiction.

Infinitely long statements (a proof)

2006-10-20T12:57:00.000-07:00

This post addresses a point raised in the previous post: Is there a set of noncomputable but grammatical strings that are inherently impossible to cryptanalyze?

Again, we are assigning a digit to each symbol in some logic language (agreeing to first make sure we start out with a sufficiently high base number system). A string of digits then represents a string of symbols.

A grammatical string of symbols is one whereby certain substrings are barred as ungrammatical. But this does not mean we rule out logical contradictions or "false" statements. For example the string (A and not-A) is permitted. However, we see that the set of all proofs (defining proof as a statement verifying another statement) is a subset of our set of grammatical strings.

Whether an infinitely long grammatical string represents a proof, or a true or false, or undecidable, statement is a matter of philosophical preference.

But, to the matter at hand:

Can an infinitely long string be noncomputable but grammatical? The answer depends on the "reasonableness" of the grammatical rules. Note that in routine first-order logic notation our biggest concerns as to grammar are the right and left parentheses. If we had a set of 30 symbols, we would still have nearly 28 random choices for step n+1. So let's be generous and suggest that for language L, half the symbol set is disallowed.

[Note: I have been told that there is a proof that some such strings are satisfiable (have a truth value) and that others are undecidable.]

Now, using Zermelo-Frankel set theory's infinity axiom to permit use of induction, we consider the set of all n-length strings of base K digits (there are K^n strings).
By induction we see that we obtain the set of all possible strings, and this must be bijective with the set of reals.

Now suppose we add the proviso that at any n, we permit only (K^n)/2 strings. Yet by the ZF infinity axiom and induction we obtain half the reals, which is still a nondenumerable infinity. Since the computables have a denumerable cardinality, there must be a nondenumerable set of noncomputable but grammatical strings.

However, for grammatical rules that increasingly limit the number of choices for n, this theorem is not valid.

Related pages by Conant:
http://www.angelfire.com/az3/nfold/diag.html
http://www.angelfire.com/az3/nfold/choice.html
http://www.angelfire.com/az3/nfold/qcomp.html

Information theory and intelligent design

2006-10-12T08:39:00.000-07:00

Draft 3

Before his trail-blazing paper on information theory (or "communication theory"), Claude Shannon wrote a confidential precursor paper during World War II on the informational and transmission issues inherent in cryptography, an indication of how closely intertwined are information theory and cryptology.

In this post, we digress from cryptology a bit to approach the issue of "meaning" in information theory, an issue Shannon quite properly avoided by ignoring it. We are going to avoid the philosophical depths of "meaning" also while addressing a continuing concern, the fact that some information is more useful or compelling or relevant than other information. We might think of Shannon's work as a complete generalization of communicative information, whereas our idea is to draw some distinctions. (I have only a modest familiarity with information theory and so I have no idea of whether any of what follows is original.)

For convenience we limit ourselves to the lower case alphabet and assign equal probability to the occurrence of letters in a letter string. We also use the artificially short string n=4. In that case, the Shannon information content of the gibberish string abbx equals 18.8 bit. The Shannon information value of the word goal is likewise 18.8 bit.

Now we ask the probability that a four-letter string is an English word. Let us suppose there are 3,000 four-letter English words (I haven't checked). In that case, the probability that a string belongs to the set of English words would be 3000/26^4, or 0.0065, which we now characterize as equivalent to a structured information content of 0.0095 bit. Of course, the alphabet provides the primary (axiomatic?) structure. In this case, an English dictionary provides the secondary structure.

The number of gibberish strings is then 1-0.00656, or 0.9934, which we say is equivalent to a structured information content of 0.0095 bit. We see that these values are closer to our intuitive notion of information and also fits well with the Shannonist notion that a piece of information carries a surprisal value.

Here we say that we are not particularly surprised at the string abbx because it is a member of a lawless set and because background noise is, in many circumstances, ubiquitous. We say that for our purposes the information value of any member of the lawless set is identical with the information value of the set, as is the case for any member of the structured set and the structured set. On the other hand, the surprisal value of the string goal is fairly high because of the likelihood that it was not generated randomly and hence stems from a structured or designed set. That is, the chances are fairly good that a mind originated the string goal but the chances that a mind originated a string such as abbx are harder to determine. Clearly, our confidence tends to increase with length of string and with the number of set rules.

We see how our concept of structured information fits well with cryptography, though we will not dwell on that here.

Another way to deal with the structure issue here is to ignore the gibberish strings and simply say that goal has a probability of (say) 1/3000, with an equivalent information content of 11.55 bit.

What we are doing here is getting at a principle. We are not bothering to assign exact probabilities to individual letters, letter pairs, letter triplets or letter quadruplets. We are not assigning an empirical frequency to the word goal.
Rather, what we are doing, is closing on the problem of assigning an alternative information value to patterns that show a specified structure.

Above, we have used a streamlined alphabet. But a set of some logic language's symbols can be treated like an alphabet. Importantly, we assign only grammatical symbol strings to the structured set, using rules such as ")" cannot be used to begin a sentence. We can then use the process sketched above to assign a structured information value to any string.

Clearly this method can be used for all sorts of sets divided into lawless and lawful subsets, where "law" is a pairing rule or relation. (For example, by this, we could arrange that a non-computable irrational number have a much lower information value than a computable irrational.)

We see that the average information of a gibberish string (as defined via the structured set) is far less than that of the string matching elements of the structured set. For example, the string abbx rsr is a member of the gibberish set and gibberish, in this case (assuming as a wild guess 5,000 three-letter English words) has a probability of 1-(3,000/26^4) + 1-(5,000/26^3), for an information average of 0.4925 bit. Compare the string goal new (disregarding word order), which has the complementary probability, with an information average of 50 bit.

Hence, if one saw the message goal new one could have a strong degree of confidence that the string was not random but stemmed from a designed set.

A design inference?
William Dembski, the scholar who advocates a rational basis for inferring design by intelligence, uses the SETI example to buttress his cause. The hunters of extraterrestrial intelligence, in a fictional account, are astounded by a sequence of radioed 'zeroes and ones' that matches the prime number sequence for the first 100 primes. One must assume that such a low entropy (and high average information) content must be by design, he says, and uses that as a basis for justifying the inference of an intelligent designer behind the creation of life.

However, it should be noted that it seems imperative that in order to have a set of low entropy elements, there must be a human mind to organize that set (not the physical aspects, but the cognitively appreciated set). So, such a bizarre signal from the stars would be recognized as other than background noise because of a centuries-long human effort to distill certain mathematical relations into concise form. Hence, human receivers would recognize a similar intelligence behind the message.

But does that mean one can detect a signal from amid noise without falling into the problem whereby one sees all sorts of "things" in an atmospheric cloud?
That is, when one says that the formation of the first life forms is highly improbable, what does one mean? Can we be sure that the designer set (using human assumptions) has been sufficiently defined? (I am not taking sides here, by the way.)

However, as noted above, the question of computability enters the picture here. Following prevailing scientific opinion (Penrose being an exception), every organism
can be viewed as a machine and every machine responds to a set of algorithms. Hence every machine can be assigned a unique and computable number. One simply assigns numbers to each element of the logic language in use and puts together an algorithm for the machine. The machine's integer number corresponds to some right-left or left-right sequence of symbols (to avoid confusion, the computation may require a high-base number system).

So then, the first organic machines -- proto-cell organisms perhaps -- must be regarded as part of a larger machine, the largest machine of all being the cosmos. But, the cosmos cannot be modeled as a classical machine or computer. See link in sidebar.

A sea of unknowable 'designs'
Also, the set of algorithmic machines (the set of algorithms) is bijective with a subset of the computable reals (some algorithmic substrings are disallowed on grammatical grounds).

Now a way to possibly obtain a noncomputable real is to use a random lottery for choice of the nth digit in the string and, notionally, to continue this process over denumerable infinity. Because the string is completely random we do not know whether it is a member of the computable or noncomputable reals (which set, following Cantor's diagonal proof and other proofs, has a higher infinite cardinality than the set of computable reals).

So there is no reason to conclude that a grammatical string might not be a member of the noncomputables. In fact, there must be a nondenumerable infinity of such strings.
Nevertheless, a machine algorithm is normally defined as always finite. On the other hand, one could imagine that a machine with an eternal time frame might have an infinite-step algorithm.

That is, what we have arrived at is the potential for machine algorithms that cannot possibly have been arrived at by human ken. Specifically, we have shown that there exists a nondenumerable infinity of grammatical statements of infinite length. One might then argue that there is a vast sea of infinite designs that the human mind cannot apprehend.

Israeli army's radio security flaws

2006-09-22T09:05:00.001-07:00

Had the Israeli army used one of my encryption methods (see previous posts), Hezbollah's resistance might not have been so effective.

Mohamad Bazzi, a reporter for Newsday, disclosed in a Sept. 18 report that Hezbollah intelligence had broken into Israeli army security -- with important military consequences.
Israeli military radio, which uses frequency hopping and supposed strong encryption, was targeted by Hezbollah analysts who may have been using Iranian equipment. It was speculated that some of the radio security breaches occurred because of encoding mistakes by radio operators. Experts can sometimes use such errors to break into an encryption system.

http://newsday.com
(Search: Hezbollah cracked)

However, even with human error, a onetime keyworm system is highly resistant to timely cryptanalysis. So this suggests the possibility that the Israeli military was using a commercial-grade system, perhaps believing that it had high-grade security. But, I suspect that numerous commercial systems have back-doors required by the various national security agencies. Crypto-communists in the dark hearts of western governments would never permit use of uncrackable encryption systems.

On the other hand, Bazzi's story notes that Hezbollah made effective use of traffic analysis. Hence, Hezbollah likely used volume analysis and direction-finding triangulation to pinpoint command posts and areas of military concentration. In addition, analysts might have identified a few stock phrases appearing in transmissions and used these to identify units.

Overconfidence played a significant role in Israel's failure to score a knockout blow, a general told Bazzi. A raid on a Hezbollah signals intelligence unit came up with the cell phone numbers of Israeli commanders, leading one to wonder whether the commanders were violating signal security through use of cell phones. They are handy and fast, after all.

Power cipher II

2006-09-09T11:34:00.000-07:00

This next cipher (which is really a cipher/code hybrid) couples the pseudorandom generator with an old system, the dictionary method.

During the U.S. Civil War, one method used was the dictionary system, whereby words were swapped according to some specific scheme. For example, the encoder would find his word at the mth position on page x of a copy of a dictionary used by both parties and then go backward or forward n pages and count down to the mth word on that page. The decoder would find the code word on page (x - n) and then find the decryption at position m on the xth page.

Such systems are easy to crack. But, not when hitched to a computerized process!

There are two big advantages to the system I propose.

* On average, there is no data inflation. In many systems, nulls are an important feature for bringing noise into the message. Also, most systems inflate because they are perforce not as efficient as the most efficient Huffman-type codes used for the plaintext. However, this problem does not occur in our system.

* Since the message contains all natural language words, the frequencies will reflect those of the language in the dictionary used. Hence, the fact of encryption won't be detected by simple automated programs that use standard frequency analysis.

Here's the system:

We use a public key method for arranging that the recipient computer receive an identical copy of some lexicon of n words. Then we send a key for a deterministic recursive pseudorandom number generator, plus a set of seeds (initial values).

The pseudorandom function and seed set are automatically changed via public key after specified numbers of transmissions.

Each word in the dictionary is assigned a number, which reflects its canonical (alphabetical) position. At every mth word of the message a new seed is used to generate a new permutation.

The generator then constructs a permutation of the numbers [1, n]. The original lexicon is then put in a bijective relationship with the permuted lexicon. This substitution cipher/code permits the "random" possibility that a word will represent itself. This is helpful in that involutary functions can tell the adversary where a probable word isn't. The fact that the lexicon can have an n as high as 500,000 (approximate number of words in English) means that unscrambling can be put out of computational reach, especially in light of the fact that the permutation is changed so often as to make comparisons of any sort quite unlikely.

This system however does not contain the super-encryption of the system below, which adds to security by masking the pseudo-random number generator. So, in exchange for good camouflage and efficiency, the system sacrifices the extra security of Hill super-encipherment.

A variant of this system: use two dictionaries (even if separate languages). Both must be transmitted to the recipient. The, say, English lexicon of n words is then coupled with, perhaps, a Russian dictionary of n words, which is permuted as above. So the English plaintext comes out as a mostly non-grammatical string of Russian words.

Of course, this natural-language counterfeiter would face a countermeasure in the form of a detector that counts frequencies of common one-, two- and three-letter words, such as "a", "I", "of", "be", "and", and "the." A method could then be devised to outsmart the detector, but at the expense of efficiency (we'd have to introduce nulls).

Exporting strong encryption?

2006-09-07T08:10:00.001-07:00

In the post 9/11-era, U.S. policy concerning the export of strong encryption technology is something of a mystery. The NSA has never liked strong encryption for the masses and has long fought to find ways to limit its availability.

In the Clinton years, the burgeoning of computer-to-computer financial transactions put business on the side of the privacy advocates. An arms-export case against Phil Zimmerman, who made strong encryption available via internet download, was dropped.

It has been reported however that an Iranian assassination plot was solved because Crypto AG sold the Iranians a strong encryption program that had a backdoor through which the U.S. could enter. So one must worry when downloading Zimmerman's PGP program that one is not receiving a lookalike program with a hidden key available to intelligence agencies. Considering the NSA's attitude in general, I'd say there is a fairly strong chance of such a Trojan Horse compromise.

Obviously if the NSA's no-warrant wiretaps are being used against encrypted phone calls from "terror suspects" they would be of little use unless the NSA could decipher the messages.
Yet, when one checks the FBI's cybercrime page, export of strong encryption is not listed as a concern. The State Dept. these days has little to say, other than the claim that it checks to make sure strong encryption products exported to China are used strictly for business and not for military purposes.

Bush, without going into specifics, has signed an executive order continuing Clinton's executive order intended to limit the export of dual-use technology, including strong encryption.

Additionally, Congress wants internet phone services to build in technology that makes it easy for the feds to wiretap.

So I am going to provide an algorithm that a competent programer could turn into a super-strong encryption system that can be used by anyone.

But before doing so, a couple of remarks:

* Slower public key cryptography may well be safer than courier for passing of keys to the faster symmetric key encryption systems. But, public key systems are particularly vulnerable to the cryptological blunder of encipherment of the same plaintext with a different keytext. Cryptanalysts can then compare the cryptotexts and divine the particular key.
So we can well imagine that the NSA has a "crib" exploitation unit that specifically works on this weakness of public key systems. It is quite likely the agency uses network theory and Ramsey theory in its search for significant patterns that would disclose such weak points.

* A system, such as the so-called one-time pad system, that uses a randomly numbered key that is at least as long as the message is provably uncrackable in principle, though human error, such as occurred in the Venoma affair, can lead to compromise.

* The following algorithm can be adapted in such a way as to provide highly secure internet phone links that bypass wiretap backdoors.

Actually, I am not going to give a formal algorithm but an outline that is sufficient to enable a computer programer to set up the system. This public knowledge does not significantly affect the effectiveness of the algorithm, unlike the case of numerous other "strong" encryption programs.

Our asymmetric system uses the public key method to initiate the efficient symmetric key method. That is, the private keys are provided during public-key communication.

The primary private key is simply a recursive deterministic pseudo-random number generator with a specific initial value. The function is chosen with an eye to assuring that its complexity is such that there is no known way to arrive at the nth value without a computer doing about the same amount of work.

The receiver then has that generator and is able to determine the key using it and the proper initial value.

The number can be used to encrypt a message in two ways:

1. The number can give the permutation of the binary digit number representing a block of plaintext.

This is similar to how DES and AES work. Even in the 1970s it was believed that NSA supercomputers could find all relevant permutations of a 58-digit number (in binary). These days, block lengths of 2^10 digits are recommended. Note that because of constraints of coding (not encipherment), cryptanalysts could weed out numerous permutations as no-go's, thus cutting computer time substantially.

2. The number can be used with the ASCII character set prior to encodement. That is, the software cuts the message into specific block lengths. Each ASCII character tops a column composed of pieces of the long digit string (number). These smaller numbers represent the character. Once the program enciphers a charcter with one member of the column, it uses another member for the next appearance of that character. Probability of the same number appearing twice is very low.

Example using artificially small strings:

A B C
0110 1111 1010

1111 1100 0101

0011 1110 1100

That is, the generator produces 011011111010111111000101001111101100 and breaks it up by following the standard ASCII character order.
"CAB" would then be enciphered 101001101111. A second encipherment of "CAB" would yield 111111000101.

As long as the number is effectively random, messages so enciphered are uncrackable. However, it is left to the programer to devise a subroutine that chooses pseudorandom numbers of some specific block length. This generator should change at specified intervals. Compromise of the generator is a significant concern.

However, secondary encryption (superencryption) is the next step. Being aware that methods have long existed for stripping off simple multipliers or added sums, we resort to the classical Hill cipher. Once the pseudorandom number is generated, another program uses linear algebra to encrypt that number. That is, a noncommutative multiplication is made on a matrix composed of pieces of the digit string.

A secondary key is the matrix A which is used to multiply B, which is composed of some set of elements of the digit string. With the property of invertibility included when selecting A, the receiver is able to recover the pseudorandom number.

The purpose of superencryption is to disguise any possible tell-tale signs inherent in the pseudorandom number that might imply which recursive generator was used. That key should also be replaced regularly via the public key method.

So, now the question here is: does this post, available internationally, violate U.S. strong encryption export laws? Also, is there any point to such laws, considering that competent programers can presumably rather easily implement strong encryption.

Also, software using this method can be adapted to internet phone links, and downloaded into PCs.

Of course, one must still be aware of the fact that terminals give off characteristic electromagnetic emissions that give away what keys are being pressed. An eavesdropper with the right equipment can park in a van nearby and record the plaintext before it is encrypted. Material to absorb such emissions is on the market, but the government apparently makes its purchase difficult.

By the way, this method can be adapted to stream encryption, which is encipherment one digit at a time. As long as both random number generators are in sync, each binary digit can be pseudo-randomly enciphered as either itself or its complement. In that stream encryption is usually chosen because speed is valued at the expense of airtight security, we may wish to dispense with the super-encipherment step.

Otherwise, we can use a second pseudo-random number generator for that step. This provides no additional security for a truly random first step. However, pseudo-randoms have certain telltale traits, which might enable a cryptanalyst to discern the specific function used.

Also, one might super-encrypt with a Hill cipher thus: A X B, with A constant and B having all matrix positions but one constant. That matrix hole is filled with the particular pseudorandom digit.

UNDER-THE-RADAR ENCRYPTION
Suppose you'd rather that the authorities not know that your message is encrypted. You could use steganography, the art of concealing the message's existence.

These days, however, there are a lot of messages and so, we can surmise, automated programs are used to sniff out those messages that are encrypted and copy them for further analysis.
They detect encryption with frequency analysis. It's simple; each natural language (and technical language variant) has a unique frequency distribution that is summed up as the index of coincidence. That is, on average, in English "e" appears about 12% of the time and other letters have typical probabilities of occurrence. So if we add up the probabilities we get a unique sum.

Hence, a program can analyze the frequencies of characters showing up in a text and determine whether the sum is within the bounds for a known plaintext language. If not, the program will assess that the text is probably encrypted.

To evade this detector, we simply require that a program be devised to require that every number that appears in the text be repeated according to its typical frequency. The decryption program knows to ignore all repeated numbers as "nulls." Of course, we must make sure that the ratio of keytext length to message text length is such that a number isn't repeated as a non-null.

Now this preventive measure will slow down encryption-sniffing. But we can imagine that programs used for mechanical translation from one language to another can be adapted for use as encryption sniffers.

The addition of these "nulls" does nothing to upgrade the security of the encryption itself. Their purpose is simply to make spotting of encrypted messages tougher for traffic analyses programs.