Friday, November 10, 2006

Signal or noise?

As suggested in the post ET, phone home, it is in principle possible to distinguish noise from a signal, though this may not always be practically feasible. Here we use a simple example to show how nonrandomness might be inferred (though not utterly proved).

Suppose one tosses a fair coin 8 times (or we get finicky and use a quantum measurement device as described previously). What is the probability that exactly four heads will show up?

We simply apply the binomial formula C(n,x)(p^x)(q^n-x), which in this case is set at 70/2^8, for an equivalent percentage of about 27%. The probability is not, you'll notice, 50 percent.

The probability of a specific sequence of heads and tails is 2^(-8), which is less than 1 percent. That's also the probability for 0 heads (which is the specific sequence 8 tails).

Probability for 1 head (as long as we don't care which toss it occurs in) is about 3%, for 2 heads is about 10%, and for 3 heads is about 22%.
As n increases, the probability of exactly n/2 heads decreases. The probability of getting exactly 2 heads in 4 tosses is 37.5%; the probability of exactly 3 heads in 6 tosses is 31.25%.

On the other hand, the ratio of flips to heads tends to approximate 1/2 as n increases, as is reflected by the fact that the case heads occurs n/2 times always carries the highest probability in the set of probabilities for n.

That is, if there are a number of sets of 8 trials, and we guess prior to a trial that exactly 4 heads will come up, we will tend to be right about 27% of the time. If we are right substantially less than 27% of the time, we would suspect a loaded coin.

Yet let us beware! The fact is that some ratio must occur, and n/2 is still most likely. So if n/2 heads occurs over, say 100 tosses, we would not be entitled to suspect nonrandomness -- even though the likelihood of such an outcome is remote.

Note added July 2007: As de Moivre showed, Stirling's formula can be used to cancel out the big numbers leaving (200pi)0.5/(100pi), which yields a probability of close to 8 percent. For 1000 tosses, the chance of exactly 500 heads is about 2.5 percent; for 1 million tosses, it's about 0.08 percent.

(Suppose we don't have a normal table handy and we lack a statistics calculator. We can still easily arrive at various probabilities using a scientific calculator and Stirling's formula, which is

n! ~ (n/e)n(2n * pi)0.5

Let us calculate the probability of exactly 60 heads in 100 tosses. We have a probability of

(100/e)100(200pi)0.5/[2100(60/e)60(40/e)402pi(2400)0.5]

which reduces to

50100/(4040 * 6060) * [(200pi)0.5/(2pi(2400)0.5]

We simply take logarithms:

x = 100ln50 - (40ln40 + 60ln60) = -2.014

We multiply e-2.014 by, 0.814, which is the right-hand boldface ratio in brackets above, arriving at 0.01087, or 1.09 per cent.)


Second thought
On the other hand, suppose we do a chi-square test to check the goodness-of-fit of the observed distribution to the binomial distribution. Since the observed value equals the expected value, for x/500 + y/500 (that is x=0 and y=0), then chi-square equals 0.
That is, the observed distribution perfectly fits the binomial curve.
But what is the probability of obtaining a zero value for a chi-square test? (I'll think about this some more.)
Note added July 2007: In these circumstances, this doesn't seem to be a fair question.

Back to the main issues
Suppose a trial of 20 tosses was reported to have had 14 heads. The probability is less than 4% and one would suspect a deterministic force -- though the probabilities alone are insufficient for one to definitively prove nonrandomness.

Similarly, when (let's say digital) messages are transmitted, we can search for nonrandomness by the frequencies of 0 and 1.

But as we see in the ET post, it may become difficult to distinguish deterministic chaos from randomness. However, chaotically deterministic sequences will almost always vary from truly random probabilities as n increases.

Addendum


Sometimes data sets can be discovered to be chaotic, but nonrandom, by appropriate mappings into phase space. That is, if an iterative function converges to a strange attractor -- a pattern of disconnected sets of points within a finite region -- that attractor can be replicated from the data set, even though an ordinary graph looks random.

Thursday, November 09, 2006

A slippery subject

Probability is a slippery subject in more ways than one.

Here's an example (from Martin Sternstein's Statistics Barron's, 1994) of a possibly counterintuitive result:

Suppose that 6 percent of stories of major stock market success stem from illegal insider information passing. In some arbitrary group of seven highly successful stock traders, what would you think would be the chance that only one member of the group is a crook whose success stems from illegal insider activity?
The answer may surprise you.

We apply the binomial formula C(n,x)(p^x)(q^n-x). That is, 7!/6!(0.06)(0.94)^6 = 0.2897. So, the chance only one member is a crook is close to 30 percent!
The chance that at least one member is a crook is 1-(0.94)^7 = 0.351, or about 35 percent. That is, the chance is greater than one in three that one successful member of some group of seven successful stock traders is a crook -- despite the overall low incidence of criminality!

Sunday, November 05, 2006

ET, phone home

This post also concerns the "intelligent design" controversy. See related posts below.

Let's think about the signal detected by SETI researchers in the fictional Contact scenario, an example given by intelligent design proponent William A. Dembski.

The signal from deep space was a sequence of sets of beeps, with each subsequent set holding the number of elements equal to a corresponding prime number. That is, the SETI team had detected a sequence of the first primes under the integer 100.
What are the chances of that? statistician Dembski asks. Obviously quite low. Detection of such a signal points to a sentient mind, he argues. Similarly, detection of a very low-probability pattern at the microbiotic level points to an intelligence behind the pattern, he says.

Suppose we don't consider the fact that the prime sequence is a well-known discrete climbing curve and we wished to know whether it significantly differed from a true random discrete curve. How would we go about detecting a difference?

Obviously, the SETI curve must differ from a fully random curve because it climbs from one value to the next, whereas a fully random curve would be likely to have an averaged out slope close to a horizontal line. So then, we must require that our candidate curve be random within constraints that require that the curve always climb. What would be the constraints? Clearly the constraints are the deterministic part of the curve.

What we find is that any computable climbing curve will do for constraints, with a set of random choices set between two values. A recursively defined curve (including a chaotic output recursive) would also do for a constraint, with the end value being randomly selected between f(x) and f(x+1). Or we can use two computable climbing curves and randomly choose a value that falls between the two curves.

For example, we might require that a value be chosen randomly between 2x and 3x. In that case, with enough values, the slope should tend to approximate [D(2x+3x)]/2, or 2.5.

Now we might check the SETI curve (up to the value received) and see what the average slope is. Even if the curve is actually composed of values of, say, p+7, the curve will map onto a known computable curve, of course. And we would suspect intelligence because we are unfamiliar with such a deterministically chaotic curve in nature.

However, it is conceivable that energy might be continually pumped into some system that results in a deterministically chaotic radio emission that shows ever-increasing numbers of bursts. Hence, we could not claim a designer was behind the emission merely on the basis of the constraints, unless we knew more about them.

Still, most processes in nature are deeply influenced by quantum phenomena, meaning a hefty degree of true randomness is injected into natural radio emissions.
So, if one could show that a radioed pattern was sufficiently deterministic, that would suffice to strongly indicate an intelligence behind the pattern. However, without knowledge of natural curves, we would have a tough time distinguishing between highly deterministic but chaotic climbing curves and truly random curves within constraints.

But, if we kept the constraints quite simple, that decision would influence how we categorize the suspect radio pattern. We would check the suspect slope and see whether its constraints (boundary slopes) are simple. If not, we would suspect full determinism, with any randomness being simple noise during transmission.

Now I don't suppose we expect constraint curves to be very complicated, probably following known climbing curves, such as supernova luminosity. That is, the slope average would conform to known natural phenomena, even if the specific patterns did not. So lacking such a slope, we could feel that we had either uncovered a new natural phenomenon or that we had encountered intelligence.

Thursday, November 02, 2006

Pseudorandom thoughts on complexity

Draft 2


This post supplements the previous post "Does math back 'intelligent design'?"

With respect to the general concept of evolution, or simply change over time, what do we mean by complexity?

Consider Stephen Wolfram's cellular automata graphs. We might think of complexity as a measure of the entropy of the graph, which evolves row by row from an initial rule whereby change occurs only locally, in minimal sets of contiguous cells. Taken in totality, or after some row n, the graphs register different quantities of entropy. That is, "more complex" graphs convey higher average information than "less complex" ones. Some graphs become all black or all white after some row n, corresponding to 0 information after that row. There exists a significant set of graphs that achieve neither maximum nor minimum entropy, of course.

How would we define information in a Wolfram cellular automaton graph? We can use several criteria. A row would have maximum entropy if the probability of the pattern of sequential cell colors is indistinguishable from random coloring. [To be fussy, we might use a double-slit single photon detector to create a random sequence whereby a color chosen for a cell is a function of the number of the quadrant where a photon is detected at time t.]

Similarly for a column.

Obviously, we can consider both column and row. And, we might also consider sets of rows and-or columns that occur as a simple period. Another possibility is to determine whether such sets recur in "smooth curve quasi-periods" such as every n^2. We may also want to know whether such sets zero out at some finite row.



Another consideration is the appearance of "structures" over a two-dimensional region. This effectively means the visual perception of at least one border, whether closed or open. The border can display various levels of fuzziness. A linear feature implies at least one coloration period (cycle) appearing in every mth row or every nth column. The brain, in a Gestalt effect, collates the information in these periods as a "noteworthy structure." Such a structure may be defined geometrically or topologically (with constraints). That is, the periodic behavior may yield a sequence of congruent forms (that proliferate either symmetrically or asymmetrically) or of similar forms (as in "nested structures"), or of a set of forms each of which differs from the next incrementally by interior angle, creating the illusion of morphological change, as in cartoon animation.

At this juncture we should point out that there are only 254 elementary cellular automata. However, the number of CA goes up exponentially with another color or two and when all possible initial conditions are considered.

So what we are describing, with the aid of Wolfram's graphs, is deterministic complexity, which differs from the concept of chaos more on a philosophical plane than a mathematical one.

We see that, depending on criteria chosen, CA graphs, after an evolution of n steps, differ in their maximum entropy and also differ at the infinite limit in their maximum entropy. Each graph is asymptotic toward some entropy quantity. By no means does every graph converge toward maximum entropy as defined by a truly random pattern.

So we may conclude that, as Wolfram argues, simple instructions can yield highly complex fields. The measure of complexity is simply the quantity of information in a a graph or subgraph defined by our basic criteria. And what do we mean in this context by information? If we went through all n steps of the rule and examined the sequence of colors in, for example, row n, the information content would be 0 because we have eliminated the uncertainty.

If, however, we don't examine how row n's sequence was formed, then we can check the probability of such a sequence with the resulting information value. At this point we must beware: Complete aperiodicity of cell colors in row n is NOT identical with maximum entropy of row n. Think of asking a high school student to simulate flipping of a coin by haphazardly writing down 0 or 1 in 100 steps. If one then submits the sequence to an analyst, he or she is very likely to discover that the sequence was not produced randomly because most people avoid typical sub-sequences such as 0 recurring six times consecutively.

So then, true randomness (again, we can use our quantum measuring device), which corresponds to maximum entropy, is very likely to differ significantly from computed chaos. This fact is easily seen if one realizes that the set of aperiodic computable irrational numbers is of a lower cardinality than the set of random digit sequences. Still, it must be said that the foregoing lemma doesn't mean there is always available a practical test to distinguish a pseudorandom sequence from a random sequence.

We might also think of deterministic complexity via curves over standard axes, with any number of orthogonal axes we like. Suppose we have a curve y = x. Because there is no difference between x and y, there is effectively no information in curve f(x). No work is required to determine f(x) from x. The information in y = 2x is low because minimal work (as counted by number of simple steps in the most efficient algorithm known) is required to determine g(x) from x. Somewhat more information is found for values of h(x) = x^2 because the computation is slightly slower.

A curve whose values hold maximum information -- implying the most work to arrive at an arbitrary value -- would be one whereby the best method of determining f(x+k) requires knowledge of the value f(x). Many recursive functions fit this category. In that case, we would say that a computed value whose computational work cannot be reduced from n steps of the recursive function or iterative algorithm holds maximum information (if we don't do the work).

So let's say we have the best-arranged sieve of Eratosthenes to produce the sequence of primes. On an xyz grid, we map this discrete curve z = f(y) over y = x^2, using only integer values of x. Now suppose we perceived this system in some other way. We might conclude that a chaotic system shows some underlying symmetry.

It is also possible to conceive of two maximally difficult functions mapped onto each other. But, there's a catch! There is no overall increase in complexity. That is, if f(x) is at maximum complexity, g(f(x)) cannot be more complex -- though it could conceivably be less so.

This conforms to Wolfram's observation that adding complexity to rules does little to increase the complexity of a CA.

Now what about the idea of "phase transitions" whereby order suddenly emerges from disorder? Various experiments with computer models of nonlinear differential equations seem to affirm such possibilities.

Wolfram's New Kind of Science shows several, as I call them, catastrophic phase transitions, whereby high entropy rapidly follows a "tipping point" as defined by a small number of rows. Obviously one's perspective is important. A (notional) graph with 10^100 iterations could have a "tipping point" composed of millions of rows.

Wolfram points out that minor aymmetries in a high entropy graph up to row n are very likely to amplify incrementally -- though the rate of change (which can be defined in several ways) can be quite rapid -- into a "complex" graph after row n. I estimate that these are low entropy graphs, again bearing in mind the difference between true randomness and deterministic chaos or complexity: the entropies in most cases differ.

What we arrive at is the strong suggestion -- that I have not completely verified -- that a high information content in a particular graph could easily be indicative of a simple local rule and does not necessarily imply an externally imposed design [or substitution by another rule] inserted at some row n.

However, as would be expected, the vast majority of Wolfram's graphs are high-entropy affairs -- no matter what criteria are used -- and this fact conforms to the anthropomorphic observation that the cosmos is en toto a low entropy configuration, in that most sets of the constants of physical law yield dull, lifeless universes.

I should note that New Kind of Science also analyzes the entropy issue, but with a different focus. In his discussion of entropy, Wolfram deploys graphs that are "reversible." That is, the rules are tweaked so that the graph mimics the behavior of reversible physical processes. He says that CA 37R shows that the trend of increasing entropy is not universal because the graph oscillates between higher and lower entropy eternally. However, one must be specific as to what information is being measured. If the entropy of the entire graph up to row n is measured, then the quantity can change with n. But the limiting value as n goes to infinity is a single number. It is true, of course, that this number can differ substantially from the limiting value entropy of another graph.

Also, even though the graphs display entropy, the entropy displayed by physical systems assumes energy conservation. But Wolfram's graphs do not model energy conservation, though I have toyed with ways in which they might.

The discussion above is all about classical models arranged discretely, an approach that appeals to the computer science crowd and to those who argue that quantum physics militates against continuous phenomena. However, I have deliberately avoided deep issues posed by the quantum measurement/interpretation problem that might raise questions as to the adequacy of any scientific theory for apprehending the deepest riddles of existence.

It should be noted that there is a wide range of literature on what the Santa Fe Institute calls "complexity science" and others sometimes call "emergence of order." I have not reviewed much of this material, though I am aware of some of the principle ideas.

A big hope for the spontaneous order faction is network theory, which shows some surprising features as to how orderly systems come about. However, I think that Wolfram graphs suffice to help elucidate important ideas, even though I have not concerned concerned myself here with New Kind of Science's points about networks and cellular automata.

Wednesday, November 01, 2006

Does math back 'intelligent design'?

Two of the main arguments favoring "intelligent design" of basic biotic machines:

. Mathematician William A. Dembski (Science and Evidence for Design in the Universe) says that if a pattern is found to have an extraordinarily low probability of random occurrence -- variously 10^(-40) to 10^(-150) -- then it is reasonable to infer design by a conscious mind. He points out that forensics investigators typically employ such a standard, though heuristically.

. Biochemist Stephen C. Meyer (Darwin's Black Box) says that a machine is irreducibly complex if some parts are interdependent. Before discussing intricate biological mechanisms, he cites a mousetrap as a machine composed of interdependent parts that could not reasonably be supposed to fall together randomly.

Meyer is aware of the work of Stuart Kauffman, but dismisses it because Kauffman does not deal with biological specifics. Kauffman's concept of self-organization via autocatalysis however lays the beginnings of a mathematical model demonstrating how systems can evolve toward complexity, including sudden phase transitions from one state -- which we might perceive as "primitive" -- to another state -- which we might perceive as "higher." (Like the word "complexity," the term "self-organization" is sometimes used rather loosely; I hope to write something on this soon.)

Kauffman's thinking reflects the work of Ilya Prigogine who made the reasonable point that systems far from equilibrium might sometimes become more sophisticated before degenerating in accordance with the "law of entropy."

This is not to say that Meyer's examples of "irreducible complexity" -- including cells propelled by the cilium "oar" and the extraordinarily complex basis of blood-clotting -- have been adequately dealt with by the strict materialists who sincerely believe that the human mind is within reach of grasping the essence of how the universe works via the elucidation of some basic rules.

One such scientist is Stephen Wolfram whose New Kind of Science examines "complexity" via iterative cellular automaton graphs. He dreams that the CA concept could lead to such a breakthrough. (But I argue that his hope, unless modified, is vain; see sidebar link on Turing machines.)

Like Kauffman, Wolfram is a renegade on evolution theory and argues that his studies of cellular atomata indicate that constraints -- and specifically the principle of natural selection -- have little impact on development of order or complexity. Complexity, he finds, is a normal outcome of even "simple" sets of instructions, especially when initial conditions are selected at random.

Thus, he is not surprised that complex biological organisms might be a consequence of some simple program. And he makes a convincing case that some forms found in nature, such as fauna pigmentation patterns, are very close to patterns found according to one or another of his cellular automatons.

However, though he discusses n-dimensional automata, the findings are sketchy (the combinatorial complexity is far out of computer range) and so cannot give a three-dimensional example of a complex dynamical system emerging gestalt-like from some simple algorithm.

Nevertheless, Wolfram's basic point is strong: complexity (highly ordered patterns) can emerge from simple rules recursively applied.

Another of his claims, which I have not examined in detail, is that at least one of his CA experiments produced a graph, which, after sufficient iterations, statistically replicated a random graph. That is, when parts of the graph were sampled, the outcome was statistically indistinguishable from a graph generated by computerized randomization. This claim isn't airtight, and analysis of specific cases needs to be done, but it indicates the possibility that some structures are somewhat more probable than a statistical sampling would indicate. However, this possibility is no disproof of Dembski's approach. (By the way, Wolfram implicitly argues that "pseudorandom" functions refer to a specific class of generators that his software Mathematica avoids when generating "random" numbers. Presumably, he thinks his particular CA does not fall into such a "pseudorandom" set, despite its being fully deterministic.)

However, Wolfram also makes a very plausible case (I don't say proof because I have not examined the claim at that level of detail) that his cellular automata can be converted into logic languages, including ones that are sufficiently rich for Godel's incompleteness theorem to apply.

As I understand Godel's proof, he has demonstrated that, if a system is logically consistent, then there is a class of statements that cannot be derived from axioms. He did this through an encipherment system that permits self-referencing and so some have taken his proof to refer only to an irrelevant semantical issue of self-referencing (akin to Russell's paradox). But my take is that the proof says that statements exist that cannot be proved or derived.

So, in that case, if we model a microbiotic machine as a statement in some logic system, we see immediately that it could be a statement of the Godel type, meaning that the statement holds but cannot be derived from any rules specifying the evolution of biological systems. If such a statement indeed were found to be unprovable, then many would be inclined to infer that the machine specified by this unprovable statement must have been designed by a conscious mind. However, such an inference is a philosophical (which does not mean trival) difficulty.