Signal or noise?
As suggested in the post ET, phone home, it is in principle possible to distinguish noise from a signal, though this may not always be practically feasible. Here we use a simple example to show how nonrandomness might be inferred (though not utterly proved).
Suppose one tosses a fair coin 8 times (or we get finicky and use a quantum measurement device as described previously). What is the probability that exactly four heads will show up?
We simply apply the binomial formula C(n,x)(p^x)(q^n-x), which in this case is set at 70/2^8, for an equivalent percentage of about 27%. The probability is not, you'll notice, 50 percent.
The probability of a specific sequence of heads and tails is 2^(-8), which is less than 1 percent. That's also the probability for 0 heads (which is the specific sequence 8 tails).
Probability for 1 head (as long as we don't care which toss it occurs in) is about 3%, for 2 heads is about 10%, and for 3 heads is about 22%.
As n increases, the probability of exactly n/2 heads decreases. The probability of getting exactly 2 heads in 4 tosses is 37.5%; the probability of exactly 3 heads in 6 tosses is 31.25%.
On the other hand, the ratio of flips to heads tends to approximate 1/2 as n increases, as is reflected by the fact that the case heads occurs n/2 times always carries the highest probability in the set of probabilities for n.
That is, if there are a number of sets of 8 trials, and we guess prior to a trial that exactly 4 heads will come up, we will tend to be right about 27% of the time. If we are right substantially less than 27% of the time, we would suspect a loaded coin.
Yet let us beware! The fact is that some ratio must occur, and n/2 is still most likely. So if n/2 heads occurs over, say 100 tosses, we would not be entitled to suspect nonrandomness -- even though the likelihood of such an outcome is remote.
Note added July 2007: As de Moivre showed, Stirling's formula can be used to cancel out the big numbers leaving (200pi)0.5/(100pi), which yields a probability of close to 8 percent. For 1000 tosses, the chance of exactly 500 heads is about 2.5 percent; for 1 million tosses, it's about 0.08 percent.
(Suppose we don't have a normal table handy and we lack a statistics calculator. We can still easily arrive at various probabilities using a scientific calculator and Stirling's formula, which is
n! ~ (n/e)n(2n * pi)0.5
Let us calculate the probability of exactly 60 heads in 100 tosses. We have a probability of
which reduces to
50100/(4040 * 6060) * [(200pi)0.5/(2pi(2400)0.5]
We simply take logarithms:
x = 100ln50 - (40ln40 + 60ln60) = -2.014
We multiply e-2.014 by, 0.814, which is the right-hand boldface ratio in brackets above, arriving at 0.01087, or 1.09 per cent.)
On the other hand, suppose we do a chi-square test to check the goodness-of-fit of the observed distribution to the binomial distribution. Since the observed value equals the expected value, for x/500 + y/500 (that is x=0 and y=0), then chi-square equals 0.
That is, the observed distribution perfectly fits the binomial curve.
But what is the probability of obtaining a zero value for a chi-square test? (I'll think about this some more.)
Note added July 2007: In these circumstances, this doesn't seem to be a fair question.
Back to the main issues
Suppose a trial of 20 tosses was reported to have had 14 heads. The probability is less than 4% and one would suspect a deterministic force -- though the probabilities alone are insufficient for one to definitively prove nonrandomness.
Similarly, when (let's say digital) messages are transmitted, we can search for nonrandomness by the frequencies of 0 and 1.
But as we see in the ET post, it may become difficult to distinguish deterministic chaos from randomness. However, chaotically deterministic sequences will almost always vary from truly random probabilities as n increases.
Sometimes data sets can be discovered to be chaotic, but nonrandom, by appropriate mappings into phase space. That is, if an iterative function converges to a strange attractor -- a pattern of disconnected sets of points within a finite region -- that attractor can be replicated from the data set, even though an ordinary graph looks random.