Thursday, July 12, 2007

The prosecutor's fallacy

There are various forms of the "prosecutor's paradox" or "the prosecutor's fallacy," in which probabilities are used to assign guilt to a defendant. But probability is a slippery subject.

For example, a set of circumstances may seem or be highly improbable. But defense attorneys might wish to avail themselves of something else: the more key facts there are in a string of facts, the higher the probability that at least one fact is false. Of course, that probability is difficult to establish unless one knows either the witnesses' rates of observational error or some standard rates of observational error, such as the rate typical of an untrained observer versus an error rate typical of a police officer.

(For a non-rigorous but useful example of likelihood of critical misstatement, please see the post Enrico Fermi and a 9/11 plausibility test. In that post we are testing plausibility which is far different from ironclad guilt or innocence. Also, for a discussion of probabilities of wrongful execution, please search Fatal flaws at Znewz1.blogspot.com.)

Suppose an eyewitness is tested for quick recall and shows a success rate of 96 percent and a 4 percent error rate. If the witness is testifying to 7 things he saw or heard with no prior knowledge concerning these things, the likelihood that the testimony is completely accurate is about 75 percent. So does the 25 percent probability of error constitute reasonable doubt -- especially if no fact can be expunged without forcing a verdict of not guilty? (Of course, this is why the common thread by several witnesses tends to have more accuracy; errors tend to cancel out.)

The prosecutor's paradox is well illustrated by The people v.
Collins
, a case from 1964 in which independent probabilities were incorrectly used, the consequence being dismissal of the conviction on appeal.

To summarize, a woman was shoved to the ground and her purse snatched. She and a nearby witness gave a description to Los Angeles police which resulted in the arrest of a white woman and a black man. I do not intend to treat the specifics of this case, but rather just to look at the probability argument.

The prosecutor told the jury that the arrested persons matched the description given to police so closely that the probability of their innocence was about 1 in 12 million.

The prosecutor gave these probabilities:

Yellow auto, 1/10; mustached man, 1/4; woman with ponytail, 1/10; woman with blonde hair, 1/3; black man with beard, 1/10; interracial couple in car, 1/1000. With a math professor serving as an expert witness, these probabilities were multiplied together and the result was the astoundingly high probability of "guilt."

However, the prosecutor did not conduct a comparable test of witness error rate. Suppose the witnesses had an average observational error rate of 5 percent. The probability that at least one fact is wrong is about 26 percent. Even so, if one fact is wrong, the computed probability of a correct match remains very high. Yet, if that fact was essential to the case, then a not guilty verdict is still forced, probability or no.

But this is not the only problem with the prosecutor's argument. As the appellate court wrote, there seems to be little or no justification for the cited statistics, several of which appear imprecise. On the other hand, the notion that the reasoning is never useful in a legal matter doesn't tell the whole story.

Among criticisms leveled at the Los Angeles prosecutor's reasoning was that conditional probabilities weren't taken into account. However, I would say that conditional probabilities need not be taken into account if a method is found to randomize the collection of traits or facts and to limit the intrusion of confounding bias.

But also the circumstances of arrest are critical in such a probability assessment. If the couple was stopped in a yellow car within minutes and blocks of the robbery, a probability assessment might make sense (though of course jurors would then use their internalized probability calculators, or "common sense"). However, if the couple is picked up on suspicion miles away and hours later, the probability of a match may still be high. But the probability of error increases with time and distance.

Here we run into the issue of false positives. A test can have a probability of accuracy of 99 percent, and yet the probability that that particular event is a match can have a very low probability. Take an example given by mathematician John Allen Paulos. Suppose a terrorist profile program is 99 percent accurate and let's say that 1 in a million Americans is a terrorist. That makes 300 terrorists. The program would be expected to catch 297 of those terrorists. However, the program has an error rate of 1 percent. One percent of 300 million Americans is 3 million people. So a data-mining operation would turn up some 3 million "suspects" who fit the terrorist profile but are innocent nonetheless. So the probability that a positive result identifies a real terrorist is 297 divided by 3 million, or about one in 30,000 -- a very low likelihood.

But data mining isn't the only issue. Consider biometric markers, such as a set of facial features, fingerprints or DNA patterns. The same rule applies. It may be that if a person was involved in a specific crime or other event, the biometric "print" will finger him or her with 99 percent accuracy. Yet context is all important. If that's all the cops have got, it isn't much. Without other information, the odds are still tens of thousands to one that the cops or Border Patrol have the wrong person.

The probabilities change drastically however if the suspect is connected to the crime scene by other evidence. But weighing those probabilities, if they can be weighed, requires a case-by-case approach. Best to beware of some general method.

Turning back to People v. Collins: if the police stopped an interracial couple in a yellow car near the crime scene within a few minutes of the crime, we might be able to come up with a fair probability assessment. It seems certain that statistics were available, or could have been gathered, about hair color, facial hair, car color, hair style, and race. (Presumably the bandits would have had the presence of mind to toss the rifled purse immediately after the robbery.)

So let us grant the probabilities for yellow car at 0.1; woman with ponytail, 0.1; and woman with blonde hair, 0.333. Further, let us replace the "interracial couple in car" event with an event that might be easier to quantify. Instead we estimate the probability of two people of different races being paired. We'd need to know the racial composition of the neighborhood in which they were arrested. Let's suppose it's 60 percent white, 30 percent black, 10 percent other. If we were to check pairs of people in such a neighborhood randomly, the probability of such a pair is 0.6 x 0.3 = 0.18 or 18 percent. Not a big chance, but certainly not negligible either.

Also, we'll replace the two facial hair events with a single event: Man with facial hair, using a 20 percent estimate (obviously, the actual statistic should be easy to obtain from published data or experimentally).

So, the probability that the police stopped the wrong couple near the crime scene shortly after the crime would be 0.1 x 0.1 x 0.333 x 0.18 x 0.2 = about 1.2-4, or about 1 chance in 8300 of a misidentification. Again, this probability requires that all the facts given to police were correct.

But even here, we must beware the possibility of a fluke. Suppose one of the arrestees had an enemy who used lookalikes to carry out the crime near a point where he knew his adversary would be. Things like that happen. So even in a strong case, the use of probabilities is a dicey proposition.

However, suppose the police picked up the pair an hour later. In that situation, probability of guilt may still be high -- but perhaps that probability is based in part on inadmissible evidence. Possibly the cops know the suspects' modus operandi and other traits and so their profiling made sense to them. But if for some reason the suspects' past behavior is inadmissible, then the profile is open to a strong challenge.

Suppose that a test is done of the witnesses and their averaged error rate is used. Suppose they are remarkably keen observers and their rate of observational error is an amazingly low 1 percent. Let us, for the sake of argument, say that 2 million people live within an hour's drive of the crime scene. How many people are there who could be mistakenly identified as fitting the profile of one of the assailants? One percent of 2 million is 20,000. So, absent other evidence, the probability of wrongful prosecution is in the ballpark of 20,000 to 1.

It's possible that the male or female associate of the innocent suspect's partner is guilty, of course. So one could be an innocent member of a pair while the other member is guilty.

It's possible the crime was by two people who did not normally associate, which again throws off probability analysis. But, let's assume that for some reason the witnesses had reason to believe that the two assailants were well known to each other. We would then look at the number of heterosexual couples among the 2 million. Let's put it at 500,000. Probability is in the vicinity of 5000 to 1 in favor of of wrong identification of the pair. Even supposing 1 in 1000 interracial couples among the 2 million, that's 2000 interracial couples. A one percent error rate turns up roughly 20 couples wrongly identified as suspects.

Things can get complicated here. What about "fluke" couples passing through the area? Any statistics about them would be shaky indeed, tossing all probabilities out the window, even if we were to find two people among the 20 who fit the profile perfectly and went on to multiply the individual probabilities. The astoundingly low probability number may be highly misleading -- because there is no way to know whether the real culprits escaped to San Diego.

If you think that sounds unreasonable, you may be figuring in the notion that police don't arrest suspects at random. But we are only using what is admissible here.

On the other hand, if the profile is exacting enough -- police have enough specific details of which they are confident -- then a probability assessment might work. However, these specific details have to be somehow related to random sampling.
After all, fluke events really happen and are the bane of statistical experiments everywhere. Not all probability distributions conform to the normal curve (bell curve) approximation. Some data sets contain extraordinarily improbable "ouliers." These flukes may be improbable, but they are known to occur for this specified form of information.

Also, not all events belong to a set containing ample statistical information. In such cases, an event may intuitively seem wonderfully unlikely, but the data are insufficient to do a statistical analysis. For example, the probability that three World Trade Center buildings -- designed to withstand very high stresses -- would collapse on the same day intuitively seems unlikely. In fact, if we only consider fire as the cause of collapse, we can gather all recorded cases of U.S. skyscraper collapses and all recorded cases of U.S. skyscraper fires. Suppose that in the 20th Century, there were 2,500 skyscraper fires in the United States. Prior to 9/11 essentially none collapsed from top to bottom as a result of fire. So the probability that three trade center buildings would collapse as a result of fire is 2,500-3
or one chance in 156 billion.

Government scientists escape this harsh number by saying that the buildings collapsed as a result of a combination of structural damage and fire. Since few steel frame buildings have caught fire after being struck by aircraft, the collapses can be considered as flukes and proposed probabilities discounted.

Nevertheless, the NIST found specifically that fire caused the principle structural damage, and not the jet impacts. The buildings were well designed to absorb jet impact stresses, and did so, the NIST found. That leaves fire as the principle cause. So if we ignore the cause of the fires and only look at data concerning fires, regardless of cause, we are back to odds of billions to one in favor of demolition by explosives.

Is this fair? Well, we must separate the proposed causes. If the impacts did not directly contribute significantly to the collapses, as the federal studies indicate (at least for the twin towers), then jet impact is immaterial as a cause and the issue is fire as a cause of collapse. Causes of the fires are ignored. Still, one might claim that fire cause could be a confounding factor, introducing bias into the result. Yet, I suspect such a reservation is without merit.

Another point, however, is that the design of the twin towers was novel, meaning that they might justly be excluded from a set of data about skyscrapers. However, the NIST found that the towers handled the jet impacts well; still, there is a possibility the buildings were well-designed in one respect but poorly designed to withstand fire. Again, the NIST can use the disclaimer of fluke events by saying that there was no experience with fireproofing (reputedly) blown off steel supports prior to 9/11.

Tuesday, June 26, 2007

On Hilbert's sixth problem

The world of null-H post is the next post down.

There is no consensus on whether Hilbert's sixth problem: Can physics be axiomatized? has been answered.

From Wikipedia, we have this statement attributed to Hilbert:

6. Mathematical treatment of the axioms of physics. The investigations of the foundations of geometry suggest the problem: To treat in the same manner, by means of axioms, those physical sciences in which today mathematics plays an important part; in the first rank are the theory of probabilities and mechanics.

Hilbert proposed his problems near the dawn of the Planck revolution, while the debate was raging about statistical methods and entropy, and the atomic hypothesis. It would be another five years before Einstein conclusively proved the existence of atoms.

It would be another year before Russell discovered the set of all sets paradox, which is similar to Cantor's power set paradox. Though Cantor uncovered this paradox, or perhaps theorem, in the late 1890s, I am uncertain how cognizant of it Hilbert was.

Interestingly, by the 1920s, Zermelo, Fraenkel and Skolem had axiomatized set theory, specifically forbidding that a set could be an element of itself and hence getting rid of the annoying self-referencing issues that so challenged Russell and Whitehead. But, in the early 1930s, along came Goedel and proved that ZF set theory was either inconsistent or incomplete. His proof actually used Russell's Principia Mathematica as a basis, but generalizes to apply to all but very limited mathematical systems of deduction. Since mathematical systems can be defined in terms of ZF, it follows that ZF must contain some theorems that cannot be tracked back to axioms. So the attempt to axiomatize ZF didn't completely succeed.

In turn, it would seem that Goedel, who began his career as a physicist, had knocked the wind out of Problem 6. Of course, many physicists have not accepted this point, arguing that Goedel's incompleteness theorem applies to only one essentially trivial matter.

In a previous essay, I have discussed the impossibility of modeling the universe as a Turing machine. If that argument is correct, then it would seem that Hilbert's sixth problem has been answered. But I propose here to skip the Turing machine concept and use another idea.

Conceptually, if a number is computable, a Turing machine can compute it. Then again Church's lamda calculus, a recursive method, also allegedly could compute any computable. So are the general Turing machine and the lamda calculus equivalent? Church's thesis conjectures that they are, implying that it is unknown whether either misses some computables (rationals or rational approximations to irrationals).

But Boolean algebra is the real-world venue used by computer scientists. If an output can't be computed with a Boolean system, no one will bother with it. So it seems appropriate to define an algorithm as anything that can be modeled by an mxn truth table and its corresponding Boolean statement.

The truth table has a Boolean statement where each element is above the relevant column. So a sequence of truth tables can be redrawn as a single truth table under a statement combined from the sub-statements. If a sequence of truth tables branches into parallel sequences, the parallel sequences can be placed consecutively and recombined with an appropriate connective.

One may ask about more than one simultaneous output value. We regard this as a single output set with n output elements.

So then, if something is computable, we expect that there is some finite mxn truth table and corresponding Boolean statement. Now we already know that Goedel has proved that, for any sufficiently rich system, there is a Boolean statement that is true, but NOT provably so. That is, the statement is constructible using lawful combinations of Boolean symbols, but the statement cannot be derived from axioms without extension of the axioms, which in turn implies another statement that cannot be derived from the extended axioms, ad infinitum.

Hence, not every truth table, and not every algorithm, can be reduced to axioms. That is, there must always be an algorithm or truth table that shows that a "scientific" system of deduction is always either inconsistent or incomplete.

Now suppose we ignore that point and assume that human minds are able to model the universe as an algorithm, perhaps as some mathematico-logical theory; i.e., a group of "cause-effect" logic gates, or specifically, as some mxn truth table. Obviously, we have to account for quantum uncertainty. Yet, suppose we can do that and also suppose that the truth table need only work with rational numbers, perhaps on grounds that continuous phenomena are a convenient fiction and that the universe operates in quantum spurts.

Yet there is another proof of incompleteness. The algorithm, or its associated truth table, is an output value of the universe -- though some might argue that the algorithm is a Platonic idea that one discovers rather than constructs. Still, once scientists arrive at this table, we must agree that the laws of mechanics supposedly were at work so that the thoughts and actions of these scientists were part of a massively complex system of logic gate equivalents.

So then the n-character, grammatically correct Boolean statement for the universe must have itself as an output value. Now, we can regard this statement as a unique number by simply assigning integer values to each element of the set of Boolean symbols. The integers then follow a specific order, yielding a corresponding integer.
(The number of symbols n may be regarded as corresponding to some finite time interval.)

Now then, supposing the cosmos is a machine governed by the cosmic program, the cosmic number should be computable by this machine (again the scientists involved acting as relays, logic gates and so forth). However, the machine needs to be instructed to compute this number. So the machine must compute the basis of the "choice." So it must have a program to compute the program that selects which Boolean statement to use, which in turn implies another such program, ad infinitum.

In fact, there are two related issues here: the Boolean algebra used to represent the cosmic physical system requires a set of axioms, such as Hutchinson's postulates, in order to be of service. But how does the program decide which axioms it needs for itself? Similarly, the specific Boolean statement requires its own set of axioms. Again, how does the program decide on the proper axioms?

So then, the cosmos cannot be fully modeled according to normal scientific logic -- though one can use such logic to find intersections of sets of "events." Then one is left to wonder whether a different system of representation might also be valid, though the systems might not be fully equivalent.

At any rate, the verdict is clear: what is normally regarded as the discipline of physics cannot be axiomatized without resort to infinite regression.

*************
So, we now face the possibility that two scientific systems of representation may each be correct and yet not equivalent.

To illustrate this idea, consider the base 10 and base 2 number systems. There are some non-integer rationals in base 10 that cannot be expressed in base 2, although approximation can be made as close as we like. These two systems of representation of rationals are not equivalent.

(Cantor's algorithm to compute all the rationals uses the base 10 system. However, he did not show that all base n rationals appear in the base 10 system.)

Monday, June 25, 2007

The world of null-H

Some thoughts on data compression, implicate order and entropy (H):

Consider a set of automated machines. We represent each machine as a sequence of logic gates. Each gate is assigned an integer. In that case each machine can be represented as a unique number, to wit its sequence of logic gate numbers. If subroutines are done in parallel, we can still find some way to express the circuit as a single unique number, of course.

In fact, while we're at it, let us define an algorithm as any procedure that can be modeled by a truth table. Each table is a constructable mxn matrix which clearly is unique and hence can be written as a sequence of 0's and 1's read off row by row with a precursor number indicating dimensions. In that case, the table has a bit value of more than nxm. On the other hand, each machine's table is unique and can be assigned an index number, which may have a considerably lower bit value than nxm.

Now suppose, for convenience, we choose 10 machines with unique truth table numbers that happen to be n bits in length. We then number these machines from 1 to 10.

Now, when we send a message about one of the machines to someone who has the list of machines and reference numbers stored, the message can be compressed as a number between 1 and 10 (speaking in base-10 for convenience). So the information value of, say 7, is far lower than that for n=25 base-2 digits. Suppose that it is equally probable that any machine description will be sent. In that case the probability, in base 2, is 2-25, and the information value is 25 bits.

However, if one transmits the long-form description, it is no better than transmitting the three-bit representation of 7 (111). Clearly the information value of 3 bits for 7 is far lower than the 25 bits for the long description.

Of course Shannon's memoryless channel is a convenient fiction, allowing a partial desription which is often useful (he did some pioneering work on channels with memory also, but I haven't seen it). These days, few channels are memoryless, since almost every computer system comes with look-up functions.

So what we have is data compression. The 25 bits have been compressed into some low number of bits. But, we don't have information compression, or do we we?

If one transmits the long form message to the receiver, that information is no more useful to the receiver than the information in the abbreviated form. Iimplicit will do. Iexplicit has no additional surprisal value to the receiver.

So Iexplicit - Iimplicit might be considered the difference between the stored information value and the transmitted information value. But, once the system is up and running, this stored information is useless. It is dead information -- unless someone needs to examine the inner workings of the machine and needs to look it up. Otherwise the persons on each end can talk in compressed form about Machine X and Machine Y all the time without ever referring to the machine details. So Ie - Ii = -Ii under those conditions.

So stored information is dead information for much of the time. It has zero value unless resurrected by someone with an incomplete memory of the long integer string.

Now we have not bothered with the issue of the average information (entropy) of the characters, which here is a minor point. But clearly the entropy of the messaging system increases with compression. Information is "lost" to the storage unit (memory).

However, if someone consults the memory unit, stored information is recovered and the entropy declines.

The point is that entropy in this sense seems to require an observer. Information doesn't really have a single objective value.

Yes, but perhaps this doesn't apply to thermodynamics, you say. The entropy of the universe always declines. Turned around, that statement really means that the most probable events will usually occur and the least probable usually won't. So many scientists seek to redefine sets of "events" in order to discover more intersections. They seek to reduce the number of necessary sets to a minimum.

Note that each set might be treated as a machine with its set language elements denoted by numbers. In that case sets with long integer numbers can be represented by short numbers. In that case, again, entropy seems to be observer-dependent.

Of course one can still argue that the probability of the memory unit remaining intact decreases with time. Now we enter into the self-referencing arena -- as in Russell's set of all sets -- in that we can point out that the design of the memory unit may well require another look-up system, again implying that information and entropy are observer-dependent, not sometimes but always.

Consider a quantum experiment such as a double-slit experiment one photon at a time.
The emitted photon will land in a region of the interference pattern with a specific probability derived from the square of the photon's wave amplitude.

If we regard the signal as corresponding to the coordinates of the detected photon, then the received signal carries an information value equal to -log(p), where p is the probability for those coordinates. (I am ignoring the unneeded term "qubit" here.)

In that case, the entropy of the system is found by -p1log(p1) +...+ -pnlog(pn).

So the entropy corresponds to the wave function and the information corresponds to the collapse of the wave function -- and we see that information is observer-dependent. The observer has increased the information and decreased the general entropy.

On the other hand, information is a fleeting thing in both the classical and quantum arenas. "Shortly" after the information is received, it loses its surprisal value and dies -- until a new observer obtains it ("new observer" could mean the same observer who forgets the information).

Sunday, June 24, 2007

The Kalin cipher

Note: The Kalin cipher described below can of course be used in tandem with a public key system, or it can be done by hand with calculators. An appropriate software program for doing the specified operations would be helpful.

The idea of the cipher is to counteract frequency analysis via matrix methods.

Choose a message of n characters and divide n by some integer square. The remainder can be padded out with dummy numbers.

Arrange the message into mxm matrices, as shown,


H I H x
O W A y
R E U z


We then augment the matrix with a fourth column. This column is the first key, which is a random number with no zeros. A second 4x3 matrix is filled with random integers. This is the second key. The keys needn't be random. Easily remembered combinations might do for some types of work because one would have to know the cipher method in order to use guessed keys.

We then matrix multiply the two as MK and put MK into row canonical form.
This results in the nine numbers in M being reduced to three in [I|b], where b is the final column.


8 9 8 x 7 9 2 1 0 0 a
8 15 1 y 5 5 4 0 1 0 b
18 5 21 z 3 2 3 = 0 0 1 c
5 2 3


where (a, b, c) is a set of three rationals.

The keys can be supplied by a pseudorandom number generator in step with a decoder program. A key can vary with each message matrix or remain constant for a period, depending on how complex one wishes to get. But, as said, if one is conducting an operation where it is advantageous for people to remember keys, this method might prove useful.

By the way, if a row of the original message matrix repeats or if one row is a multiple of another, a dummy character is inserted to make sure no row is a multiple of another so that we can obtain the canonical form. Likewise, the key matrix has no repeating rows.

In fact, on occasion other square matrices will not reduce to the I form. A case in point:

a+b b+c a+b
a b c
1 1 1


The determinant of this matrix is 0, meaning the I form can't be obtained.
In general, our software program should check the determinant of the draft message matrix to make sure it is non-zero. If so, a dummy letter should be inserted, or two inconsequential characters should be swapped as long as the meaning isn't lost.

But, if the program doesn't check the determinant it will give a null result for the compression attempt and hence would be instructed to vary the message slightly.

Notice that it is not necessary to transmit I. So the message is condensed into three numbers, tending to defeat frequency analysis.

Here is a simple example, where for my convenience, I have used the smallest square matrix:

We encrypt the word "abba" (Hebrew for "father") thus:


1 2 3
2 1 1


where the first two columns represent "abba" and the third column is the first key.
We now matrix multiply this 2x3 matrix by a 3x2 matrix which contains the second key.


1 2 3 x 2 1 = 10 16
2 1 1 1 3 7 8
2 3

We then apply key 1 (or another key if we like) and reduce to canonical form:


10 16 3
7 8 1

which is uniquely represented by


1 0 -1/4
0 1 11/32


By reversing the operations on (-1/4, 11/32), we can recover the word "abba." If we encode some other four characters, we can similarly reduce them to two numbers. Make a new matrix


-1/4 x 3
11/32 y 1

which we can fold into another two number set (u, v).


We see here an added countermeasure against frequency analysis, provided the message is long enough: gather the condensed column vectors into new matrices.

We can do this repeatedly. All we need do is divide by, in this case 4, and recombine. If we have 42n matrices to begin with, it is possible to transmit the entire message as a set of two numbers, though usually more than one condensed set would be necessary. Of course we can always pad the message so that we have a block of x2n. Still, the numbers tend to grow as the operations are done and may become unwieldy after too many condensations. On the other hand, frequency analysis faces a tough obstacle, I would guess.

So a third key is the number of enfoldments. Three enfoldments would mean three unfoldments. Suppose we use a 4x4 system on 64 characters with three enfoldments.
We write this on 16 matrices. After the transform, we throw away the I matrices and gather the remaining columns sequentially into a set of 4 matrices. We transform again and are left with a single column of four numbers. So if the adversary doesn't know the number of enfoldments, he must try them all, assuming he knows the method. Of course that number may be varied by some automated procedure linked to a public key system.

Just as it is always possible to put an nxn matrix without a zero determinant [and containing no zeros] into canonical form, it is always possible to recover the original matrix from that form. The methods of linear algebra are used.

Decryption is also hampered by the fact that in matrix multiplication AB does not usually equal BA.

The use of canonical form and the "refolding" of the matrices is what makes the Kalin cipher unique, I suppose. When I say unique, I mean that the Kalin cipher has not appeared in the various popular accounts of ciphers.

An additional possibility: when an nxn matrix is folded into an n column vector, we might use some "dummy" numbers to form a different dimension matrix. For example, suppose we end up with a 3-entry column vector. We add a dummy character to that string and form a 2x2 matrix, which can then be compressed into a 2-entry column vector. Of course, the receiver program would have to know that a 2-vector is compressed from a 3-vector. Also the key for the 2x2 matrix is so small that a decryption program could easily try all combinations.

However, if a 100x100 matrix folds into a 10-entry column vector, six dummy characters can be added and a 4x4 matrix can be constructed, leaving a 4-vector, which can again be folded into a 2-vector. All sorts of such systems can be devised.

Additionally, a "comma code" can be used to string the vectors together into one number. The decipherment program would read this bit string to mean a space between vector entries.

Clearly the method of using bases or representations as keys and then transforming offers all sorts of complexities -- perhaps not all useful -- but the emphasis on matrix condensation seems to offer a practical antidote to frequency analysis.

BTW, I have not bothered to decimalize the rational fractions. But presumably one would convert to the decimal equivalent in order to avoid drawing attention to the likelihood that the numbers represent canonical row vectors. And, of course, if one is using base 2, one would convert each rational to a linear digit string. Not all decimal fractions can be converted exactly into binary. However, supposing enough bits are used, the unfolded (deciphered) number will, with high probability, be very close to the correct integer representing the character.

Saturday, June 02, 2007

Thumbnail of NIST's 9/11 scenario

Here's a thumbnail of the scenario used by the National Institute of Standards and Technology to justify its computer modeling:

When the plane collided with a tower, a spray of debris "sandblasted" the core columns, stripping them of fireproofing. Some fireproofing may have come off because of the vibrations on impact.

Some jet fuel immediately burst into flame, in a fireball seen from the street, with some of the fiery fuel dropping down the core elevator shaft and spilling out onto random floors, setting a few afire.

Jet fuel, whether fueling an explosion or an ordinary fire, cannot produce enough energy to critically weaken naked core columns in the time alloted, the NIST found.
Hence, the NIST presumed that fires were accelerated by the jet fuel but got hot enough from office materials and furnishings, which the NIST variously put at five pounds per square foot or four pounds per square foot on average.

Though the heat energy actually detected via infrared analysis and visual inspection of the photographic record was far too low to have caused "global collapse," the NIST conjectured that the fires were hotter and more intense closer to the core.

The NIST's mechanism for collapse required sufficiently energetic fires to send heat to the ceiling -- which the NIST modeled as retaining its fireproofing -- and that heat was then convected along the ceiling and sideways into the load-bearing columns and connectors.

Had the fireproofing also come off the ceiling, the heat would have dissipated upward through the floor slab and, the NIST's computer models showed, the building would have stood.

Though there were fires on lower floors, the columns would have retained most of their fireproofing, and so critical weakening wasn't possible. No witnesses reported fireproofing off bare columns on lower floors.

The NIST said collapse was initiated by the shortening of core columns -- whether from buckling or from severing. Presumably this shortening would have occurred on the impact/fire floors, since that's where the fireproofing would have come off.

The NIST realized that blown-out windows supplied oxygen to keep the fire going, but also permitted much heat to dissipate out (with the smoke), making less available for the "blowtorch" scenario.

The NIST said that the impact zone fires seemed to progress around a floor in fits and starts until it had come all the way around the core.

Here are some concerns:

In Tower 2, which fell first, the observed fire energy was very low, the NIST agreed. The NIST attributed Tower 2's fall to structural damage from plane impact with some input from the fires. The plane that struck Tower 2 clipped the corner, meaning the collision force with the core would have been lower than in Tower 1, not more. Yet it is the core that supported the weight of the structure, not the exterior.

Tower 2's top block collapsed at a 23 degree angle. What this means is that Tower 2 had much less "hammer-down" force available for the lower part of the building than some have suggested.

On the other hand, Tower 1's top block came essentially straight down. What this implies is that the core columns on the opposite side of the elevator shaft from the plane were damaged roughly equally with those on the impact side. Otherwise, the top would have tilted over, as if on a hinge. Hence, one is forced by the NIST to accept the idea that a lot of fireproofing was shaken off the opposed columns, but not off the ceiling.

Also, in order for the fires -- which dimmed and flared as oxygen lapsed or became available when the heated air blew out windows -- to damage the core columns in a manner that initiates global collapse, not only must a specific percentage of the 47 columns be critically damaged, but the critical damage must be roughly evenly spaced.
That is, if a set of closely spaced columns lost strength, the building is less likely to give way, since the load is, by design, transferred to the remaining columns.

In the case of Tower 1, the damage to the core columns must be equivalent on the opposing side of the shaft, implying that the "blowtorch" acted symmetrically with respect to energy.

The NIST, in order to get the simulation to work, must have required that on each floor the fire maintain a specific rate of progress and a specific average ceiling energy. That is, if the fire moved through the debris too slowly, it would burn out or simply fail to do enough damage to other columns. But if it moved too rapidly, the circling fire would fail to bring enough heat to bear on any particular column to bring about critical weakening.

Though massive blasts were witnessed from outside just prior to collapse, the NIST felt obliged to discount these explosions as immediate collapse causes -- because the jet fuel simply wouldn't provide enough energy to do critical structural damage.

Tuesday, April 17, 2007

The case of the missing energy

Just a note to amplify a previous post which takes a look at the energy deficit problem for the twin towers.

Correction (April 26, 2007): A mass estimate has been revised to 7 x 108 kilograms per building. This is quite a trivial matter, since the numerical mass is irrelevant. It is the energy ratio that is important.

We may regard the energy associated with the buoyant force as the binding energy of the lower structure of the 417-floor WTC2. Most of this energy went into the construction such that the structure could bear the load above. We could think of this energy as internal energy.

That is, if the entire structure collapses, how much energy should be released?
We feel safe to say that the energy must be at least as much as is required to raise a block to a specified height. That is, it must be at least mgy for a specific block and height.

Though there may be some justification for a discrete summation, which I used in a previous version of this post, I have decided that a routine integral suits our purposes nicely. So here goes.

The mass supported at some height y we estimate as roughly

M - (y/H)M = M(1-y/H)

where H is the height of the building.

The potential energy specific to that height is then

gM(1-y/H)y = gM(y - y2/H)

So, for the sum of all potential energies between ground level and y we have
(using S for the integral sign)

gM S y dy - 1/H S y2 dy

-- for the interval between height 0 and n, the story at which the upper block fell --

= gM(y2/2 - y3/(3H))

Based on a stated 770,000 tons of steel per building, we estimate building mass at 7x108kg. For WTC2, we put n at 81 and H at 417 meters, with about 3.79 meters between floors.

Plugging in those numbers, we get an internal energy of at least

1.65x1014 Joules

However, the kinetic energy from the top block's crash is given by

1/2mv2 = mgy = 0.25 x 7 x 108 x 9.8 x 3.79 = 6.5 x 109 J, which is five orders of magnitude below the opposing internal energy.

For WTC1, we use H = 420 and n = 92, with 3.82 meters between floors.
Plugging in the numbers, we obtain

1.873 x 1014 in internal energy for the lower structure, versus an impact energy of 6.5 x 109.

Now it is conceivable that a small amount of energy can bring about the release of a much larger amount of energy -- if it is well positioned. However, the binding energy of the lower structure that concerns us is all vectored so as to resist gravitational collapse. Hence, it is plausible for such a small amount of energy to topple a tower -- if it is released near the base of the structure.

The official idea that there was enough energy to hammer the lower structure down to near-ground level is simply not tenable.

Addendum: I took a look at the kinetic energies provided by the impacts of the jets and found that they were miniscule with respect to the "buoyant force energy" of each building. The NIST was somewhat ambivalent about its view of the jet impact damage.

Monday, March 19, 2007

G whiz

The conjecture here is that g is affected by a difference in the shape of the earth's core and its sea-level surface.

One problem with my post below is that the polar radius is wrong. Reference books are not always reliable about such things. Another problem is that, at least once, I used a wrong value for G, indicating that one's memory is also not always terribly reliable.

However, these wrong values don't tell the whole story about what is wrong.

Some have assumed that g differs from that expected of a sphere because the earth is non-spherical. But shape is unlikely to be the only issue.

The barycenter (center of mass or gravity) of an an object might be defined as the point where all internal gravitational forces cancel. That is, if we make wedges of equal mass with sides intersecting the barycenter, the gravitational force of each wedge cancels in pairs in a circle around the barycenter and also at the barycenter.
For an object of uniform mass, the barycenter is the centroid of the volume.

Hence, as long as we know the linear distance to the barycenter, we can determine g at the geoid (surface of the object) -- that is, as long as the object has no concavities in the surface or projections (i.e, as long as a tangent line at a perimeter point does not intersect another perimeter point without also intersecting the interior). A depression in the surface means that g at the bottom of the concavity will be reduced by the y component of Fg coming from the higher walls and, similarly for the surface at the base of a projection.

However, the figure of the earth is very close to an ellipsoid, though reportedly it is somewhat pear-shaped, which I am guessing means that the semimajor or semiminor axis of one ellipsoid is tacked on to that of another ellipsoid; i.e., two halves of two different ellipsoids, which share one equal axis, are pasted together. If so, we would still have the situation that the perimeter is effectively a curve where a line never intersects two tangent points without intersecting the ellipsoid's interior. Hence, we would still be able to calculate g by angle (and, to be fussy, by altitude above sea level).

Anyway, to find the distance to the geoid for an ellipsoid, given the angle, we have
r = (cos2/a2 + sin2/b2)-0.5

Now the value of g at latitude 45.5 degrees has been set at 9.80665.

So we plug in the following values:

Polar radius: 6357000 meters
Equatorial radius: 6378000 m
Earth mass: 5.9736 x 1024
G: 6.67259 x 10-11

Assuming confidence about the earth's mass, at 45.5 degrees, the radius of the earth's ellipsoid is 6367.29 km.
But, setting g = 9.80665, the accepted value for that latitude, r should be 6375.36, meaning that it is 8.07 km shorter than can be accounted for by g = GMearth/r2.

Yes, it may be that the earth is a bit lumpy at the geoid, but another influence may be at work: the earth's interior is not a unform body. In fact, it is believed that the mostly iron core (with a radius about the same as the moon's) is rotating separately from the remainder of the planet.

If we suppose that the core's figure has far less eccentricity than that of the geoid, we can see that the mass distribution will vary by angle, with the highest density, on average, at the poles and the least at the equator. That is, g would decrease more rapidly from pole to equator than would be so for an ellipsoid of uniform mass.

Well, I suppose we might also like to calculate the moon's influence, which should cause the effective g to be less when the moon is in the sky and more when it is behind the earth from the observer. However, I am sure that the official guardians of g have taken the moon's effect into account. In general, I would expect that the effect would on average cancel out, but then again, its effect on local g may well depend on the moon's orbit with respect to latitude.