Good Math/Bad Math

Tuesday, March 14, 2006

Studying Bad Probability: the Creation Science Research Center

So, with a handy-dandy taxonomy of probability errors, let's take a look at an example document. This comes from a very peculiar site called parentcompany.com, which hosts the "Creation Science Research Center"

There's a wealth of really bad stuff there. For today, I'm looking at essay 44, which purports to discuss probability and the origin of life.

It starts bad:


For roughly fifty years secular scientists who have faith in the power of dumb atoms to do anything have been carrying on scientific research aimed at finding out how the dumb atoms could have initiated life without any outside help. Since they believe that this really happened, they believe that it was inevitable that the properties of atoms, the laws of physics, and the earth's early environment should bring forth life. More sober minds, however, have realized the immense improbability of the spontaneous origin of life (called "abiogenesis"). Some have made careful investigations and mathematical calculations to estimate what the probability is for abiogenesis to occur. Their calculations show that life's probability is extremely small, essentially zero.


Right off the bat, we've got a lot of confusion. It's mostly unrelated to the math, but they make the frequent error of trying to connect together evolution and abiogenesis - two separate issues. Evolution is a process that operates on reproducing entities - abiogenesis is a process by which those reproducing entities are first come into existence.

But even mathwise, they can't get through their first paragraph without screwing up. They start right off with a cheap "big numbers" claim. No actual numbers specified, but a blanket assertion that the probability is so small, it's "essentially zero".


To understand these results let us explain what we mean by probability. What, for example, is the probability of tossing a coin and getting "heads"? There are two possible outcomes of tossing a coin, either the head side or the tail side will be up. The sum of the probabilities of these two outcomes is 100% or 1, unity. Then, since for a perfectly balanced coin the two probabilities must be equal, and their sum is 1, the probability of either heads or tails in one flip of the coin is ½ , and the sum of the two probabilities is ½ + ½ = 1. Simple. Now you understand probability!?

Now let's ask what the probability is for flipping the coin twice and getting two heads in a row. It is the product of the two probabilities of getting heads both the first time and the second time. That is, P2H = ½ x ½ = ¼. Now you understand how to calculate the probability that both of two independent events will happen. It is the product of the probabilities of the two events.


Two whole paragraphs without any egregious errors! Shame they can't
keep it up.


Next we will calculate a probability for the chance production of a single small protein molecule. A protein molecule consists of one or more chains made up of amino acid molecules linked together. There are 20 different amino acids molecules which the cells use to construct the protein molecules needed for the life of cells. We will think about a small protein molecule with only 100 amino acid molecules in its chain. Assume we have a reaction pot containing a mixture of the 20 different amino acid molecules, and they are reacting at random to form chains. What is the probability, when a chain with 100 amino acids is formed, that it will by chance have the sequence of amino acids needed to form a particular working protein molecule?


Here, we can see the beginning of a misshapen search space. If you want to talk about the probability of something, you need to make sure that you've got a decent model of the process, or any number you generate will be nothing more than gibberish. They're not doing that. Amino acids don't randomly attach to each other in totally arbitrary ways - there's a very complicated geometry to protein molecules, and they can only fit together in certain ways. So right off the bat, they're trying to create a flat search space, because it will give them worse-looking probability numbers, even though the real problem that they're modeling is far from flat.



There are 100 positions along the chain. What is the probability that a particular one of the 20 different natural amino acid molecules will by chance be placed at position number 1 in the chain? It will be P1 = 1/20. When the complete chain has formed, what is the probability that the necessary particular amino acids will be placed at each of the 100 positions in the chain? It will be the product of the probabilities at the 100 positions. Thus the probability will be the fraction 1/20 multiplied by itself 100 times. So P100 = (1/20)x(1/20)x(1/20)x...x(1/20) = (1/20)100 = (1/10)130 = 1/10130. This is an extremely small fraction. It is the fraction formed by the number 1 divided by the number formed by 1 followed by 130 zeros!


Wow! What a small number - why, it's only 30 orders of magnitude more probable than the outcome of a random shuffle of two decks of cards! And it's bad combinatorics - the probability of a particular protein is not the produce the one hundred positions. Those 100 positions are not independent - there's a combinatorial relationship there.



But we have oversimplified a little bit. In actual fact a protein molecule can have a substantial variability at many of the positions on its amino acid chain. In 1975 I examined the data for a particular protein molecule called cytochrome a which has about 100 amino acids in its chain. This is an important enzyme molecule in all living cells, and the sequence of amino acids has been determined for cytochrome a molecules in about a hundred different species. From the quantitative data I made a rough estimate that on the average up to five different amino acids could fill a particular position on the chain of the enzyme molecule. Thus the probability that an acceptable amino acid would be found by chance at a particular position would be 5/20 = ¼. So the probability for a working enzyme molecule to be formed by chance would be (¼)100 = 1/1060. This is still a very, very small probability. It is the fraction formed by 1 divided by the number 1 followed by 60 zeros.


Oh, gosh, lookie there. A false independence argument. Sorry guys, but a protein isn't like a train sitting on a railroad track, where you can just put any old car in any old position, and have it fit together. You start with two amino acids stuck together randomly - the probability of a third amino acid being able to attach to that chain is not the same as the probability of that same amino acid being able to attach to another single independent amino. So in addition to inflating the probability numbers by using a bad search space, they're also cooking the numbers by treating something highly dependent as if it were independent.

And golly, there's another nifty coincidence there. 1/10^60... Guess what number that's amazingly close to? The probability of a card shuffle: 8E67 - only more probable than a card shuffle by a factor of 80 million.


In 1977 Prof. Hubert Yockey, a specialist in applying information theory to biological problems, studied the data for cytochrome a in great detail.1 His calculated value for the probability in a single trial construction of a chain of 100 amino acid molecules of obtaining by chance a working copy of the enzyme molecule is 1/1065 , or the fraction 1 divided by 1 followed by 65 zeros. This is a probability 100,000 times smaller than my very rough estimate published two years earlier. Prof. Harold Morowitz estimated that the simplest theoretically conceivable living organism would have to possess a minimum of 124 different protein molecules. A rough estimate of the probability of all of these protein molecules to be formed by chance in a single chance happening would be P124P = (1/1065)124 = 1/108060, the fraction 1 divided by the number 1 followed by 8060 zeros. Truly these are extremely small probabilities calculated through a statistical approach. They tell us that the probabilities for the chance formation of a single working protein molecule or of a living cell are effectively zero.Prof. Morowitz made a careful study of the energy content of living cells and of the building block molecules of which the cells are constructed. From this thermodynamic information he was able to calculate the probability that an ocean full of chemical "soup" containing the necessary amino acids and other building block molecules would react in a year to produce by chance just one copy of a simple living cell.2 He arrived at the astronomically small probability of Pcell = 1/10340,000,000, the fraction 1 divided by 1 followed by 340 million zeros! Yet he still believed in abiogenesis. Back in the 1970s Prof. Morowitz admitted in a public debate at a teachers' convention in Honolulu that in order to explain abiogenesis, it would be necessary to discover some new law of physics. At that time he still believed in abiogenesis, the spontaneous formation of the original living cells on the primeval earth. However, some ten years later he finally stated that in his opinion some intelligent creative power was necessary to explain the origin of life.


This paragraph is a total train wreck. It's starting with fake numbers - randomly pulling numbers out of thin air; bad combinatorics and false independence, combining them as if independentependent; and trying to use the presence of big numbers to assert that something is impossible.


There are yet more mysteries in life's probability(or improbability) which science has not plumbed. One mystery is how one virus has DNA which codes for more proteins than it has space to store the necessary coded information. A gene is a portion of the long DNA molecule which carries the code for the sequence of amino acids in a chain that folds up to produce a particular protein molecule. The DNA molecule is itself made up of four code letter molecules called nucleotides. These provide the four-letter alphabet of genetics. Their names are abbreviated by the letters A, C, G and T. A three-letter "word" called a codon codes for a particular one of the twenty amino acids used to build protein chains.


The mystery arose when scientists counted the number of three-letter codons in the DNA of the virus, fX174. They found that the proteins produced by the virus required many more code words than the DNA in the chromosome contains. How could this be? Careful research revealed the amazing answer. A portion of a chain of code letters in the gene, say -A-C-T-G-T-C-C-A-G-, could contain three three-letter genetic words as follows: -A-C-T*G-T-C*C-A-G-. But if the reading frame is shifted to the right one or two letters, two other genetic words are found in the middle of this portion, as follows: -A*C-T-G*T-C-C*A-G- and -A-C*T-G-T*C-C-A*G-. And this is just what the virus does. A string of 390 code letters in its DNA is read in two different reading frames to get two different proteins from the same portion of DNA. Could this have happened by chance? Try to compose an English sentence of 390 letters from which you can get another good sentence by shifting the framing of the words one letter to the right. It simply can't be done. The probability of getting sense is effectively zero.


The above is purest gibberish. It's bad information theory combined with bad probability: take the bad probability from the previous paragraph, and combine it with a transparently bad IT argument, and you get the above mess.

For the IT part, remember, information content is related to compressability. If you've got two sequences with a common section, a compression that takes advantage of the overlap to reduce the sequence length does not reduce the information content - in fact, it's the simplest form of non-lossy compression. And with the way that genetics works, as long as you can set up the start and stop codons right, there's no problem with the overlapping segments from a chemical point of view. It's not an unlikely thing at all.


Reasoning from these and other mathematical probability calculations, we can conclude that, without God the Creator, life's probability is zero.


And, so they conclude with a bald "big numbers" assertion: gosh, the denominators of the probability are so big that the probabilities are essentially zero.

15 Comments:

  • Amino acids don't randomly attach to each other in totally arbitrary ways - there's a very complicated geometry to protein molecules, and they can only fit together in certain ways.

    I'd note, however, that the sequence is probably random with respect to fitness, so their assertion isn't complete bull (although the fact that they didn't provide such a justification doesn't say much for their rigour). Their error here seems to be (as you later mention) that they're assuming the protein appears ex nihilo, rather than that they've got the expectation of a given protein being randomly produced wrong.

    By Blogger Lifewish, at 9:01 PM  

  • lifewish:

    The geometry of the protein molecules is a major factor in really computing probabilities for this kind of thing. The number of ways that chains of aminos can be combined into proteins is far, far smaller than the total number of permutations of aminos of protein-length. That's the point I was trying to make there.

    You're definitely right that I neglected to comment on the retrospective error there: they're looking for specific proteins, and demanding that the random process arrive at those specifics, when in fact, there's generally quite a number of proteins that could fill a specific niche; any of which are equally suitable. So you can't look at the end result of a particular protein, and then demand that the process be able to repeatedly end with exactly that protein; you need to describe the desired end-point of the process in terms of the acceptable outcomes - that is, the properties of proteins that are capable of performing the function of the end-point protein that you're looking at, and from that determining how many of the possible outcomes of amino-chaining have those properties.

    (Just look at flu virus. It's a bundle of DNA wrapped in a protein sheath. But we've cataloged hundreds of different variants of flu with different proteins in the sheath. Any of those hundreds of surface proteins does the job equally well. You can't say that flu is unlikely to have evolved because the odds of evolving the particular proteins of the H1N2 variant are too small; you need to consider the full suite of hundreds of possible H and N proteins. And that in the kind of computations that these guys are using, that's a difference that can produce a change of a couple of dozen orders of magnitude - not exactly small change!)

    By Blogger MarkCC, at 9:39 PM  

  • You're definitely right that I neglected to comment on the retrospective error there: they're looking for specific proteins, and demanding that the random process arrive at those specifics, when in fact, there's generally quite a number of proteins that could fill a specific niche; any of which are equally suitable.

    Ah, right, gotcha. I seem to recall similar situations with ID proponents pointing out how extremely specific one small element of a flagellum was, without considering all the myriad ways that a flagellum could have been designed without that bit.

    By Blogger Lifewish, at 10:03 AM  

  • Also, I think you alluded to this point but didn't state it exactly? "Essentially zero" is a very far cry from zero when it comes to probability. An event of zero probability occurs at most finite times in a series of infinite independent trials. An event of nonzero probability -- no matter how small -- occurs infinitely often in infinite independent trials.

    By Anonymous Anonymous, at 9:31 PM  

  • And if you have a long enough time for such combinations to happen, eventually one will actually "stick." The IDists never seem seem to take the time factor into account when they ramble through their mathematical mumbo-jumbo.

    Learning some basic biochemistry wouldn't hurt, either.

    By Anonymous Anonymous, at 3:20 PM  

  • The author of the article in question _nearly_ quotes Harold Morowitz as saying that new physical laws would be needed to explain abiogenesis, and that he later admitted that "some intelligent creative power was necessary".

    Having met, and attended talks by Dr. Morowitz, I believe that these statements are at best mis-interpretations, and at worst outright fabrications. I expect in the '70's he might have been refering to nascient advances in Complexity (ne Chaos) Theory. His recent book "The Emergence of Everything" might be instructive in this regard...

    MS

    By Anonymous Anonymous, at 3:49 PM  

  • Also, I'm no mathematician but I can see this: Even assuming that the odds of a successful combination of proteins occuring in a bowl of primordial soup are, say, 1 billion to 1, you might want to check on how many bowls of primordial soup you have. If at any moment you have 1 billion bowls of soup, and/or you have a billion such moments going forward, you have essentially guaranteed a successful combination. Many times over.

    By Anonymous Anonymous, at 4:03 PM  

  • Another thing IDers ignore is the vast number of molecules involved. They try to tell you that there were only 100 amino acid molecules in the whole world.

    But there are around 6 * 10^23 molecules in a mole of a substance, which for amino acids is on the order of 100 grams. (Chemists, correct me if I'm wrong). And there's got to be a huge number of grams of stuff that combining of organic molecules can happen in.

    By Anonymous Anonymous, at 4:15 PM  

  • I don't think that what I am about to say is all that new, but I would like to state it anyways.

    As one of the ways to wow people with low probabilities, creationists often ask the question What are the chances of the dominant species (human beings) evolving from a primitive life form. Then they present a fraction with some obscene number of zeros in the denominator and claim it as a proof of a miracle.

    This answer confuses two different probability events:
    1) A particular person winning a lottery, and
    2) Someone unspecified winning a lottery.

    The former has a very low probability (I like Mad Magazine's take on it: almost as low as when you don't buy a ticket), while the second is highly likely. Somebody almost always gets a winning ticket. The explanation for such a disparity in probabilities is that millions of tickets are in play, thus substantially raising the chances of someone unspecified having a winning one.

    The same goes for evolution. Even though the odds of a particular evolutionary path being taken are very low, there is an enormous number of such paths out there. So the likelihood of one of them (also unspecified) leading to a winning combination is much much better than what creationists claim.

    Besides, as MarkCC has pointed out, a winner of a lottery raising a question about the chances of him or her winning it (after the fact) is not at all that profound. A true miracle would be a pet snake suddenly pondering out loud about the probability of one of its descendants owning one of its owner's progeny.

    RC23

    By Anonymous Anonymous, at 5:38 PM  

  • Reasoning from these and other mathematical probability calculations, we can conclude that, without God the Creator, life's probability is zero.

    If they want to talk about low probabilities, what about the extrememly low probability of an omnipotent being coming into existence? If their own argument is correct then it would be impossible without another Creator creating their Creator!

    By Anonymous Anonymous, at 1:41 AM  

  • I always "knew" there was something funny about the way the numbers were being shuffled around by the "creation scientists", but my grasp of statistics isn't good enough to explain exactly why. Thanks for the explanation - I'm looking forward to more posts.

    By Blogger Peggy K, at 5:32 AM  

  • Nice post.

    I think you summed it up when you pointed out there wasn't a really comprehensive model of the system involved. Also, I'm truly curious at what the value of p_min is. By p_min, I mean the minimum probability of an event, below which the event moves from possible to impossible.

    (I had thought the value was zero, but apparently I've been deluding myself.)

    By Blogger Whispers, at 12:26 PM  

  • This shows pretty near total ignorance of any real biology, everywhere from the trivial (it's not cytochrome a, it's cytochrome c he keeps mentioning) to the serious: the idea that a protein just hops into existence in one fell swoop is a concept that only an IDer could seriously entertain.

    His statistical analysis recalls the story about how to flip coins to get heads to appear thirty times in row, everytime. Everyone in China pairs up, each pair flips a coin; the one with tails goes back to work and the one with heads pairs off with another first round winner. Iterate 29 times, when there will be two finalists, each having achieved the miracle of hitting heads 29 times in a row. The final coin toss assures one of them the unbelievable 30 consecutive heads. Now, is the probability of that happening 1 in 2^30 or 1?

    By Anonymous Anonymous, at 8:25 PM  

  • OK, first moderator please allow some counter comments also to appear on the page.

    Next, if you have to work so hard to prove that the random creation of a single protein cell is possible - imagine the amount of effort you will need, to prove the random creation of the complex self sustaining echo system on this planet with all life forms. Now add to it the random creation of the physical laws of nature balancing the whole universe.

    Those of us who believe in the intervention of an undefined force (God) in the creation of universe are already refuting the possibility of any probabilities in creation of life. So, why should we calculate the probability of creation or existence of God. God comes out to be the only explanation of the only phenomena which cannot be solved by applying known laws of physical sciences.

    Probability can neither explain the creation of life nor explain the presence or absence of God.

    By Anonymous Anonymous, at 3:49 AM  

  • Anonymous:

    I've never *blocked* comments from anyone in this thread; in fact, in the entire history of my blog, the only comments that I've ever blocked have been either spam, or content-free personal insults.

    Of course, this post is a year old, and the blog moved from blogger to ScienceBlogs 9 months ago... In the last six months, this archive of the old blog has received *2* legitimate comments, and well over 100 spams, so it's currently set to moderate by default.

    Anyway - to respond to the meat of your comment... In the original post here, I was criticizing a shallow creationist attempt to mis-use probability to support their beliefs.

    Personally, I think that probability arguments in this area are *all* nonsense: we do not have sufficient knowledge to be able to meaningfully assess the basic component probabilities that are being combined in this kind of argument. It's all just phony numbers pulled out of a hat.

    I happen to be a theist - I'm a religious Jew. That doesn't mean that I'm willing to go along with a bullshit argument slapped together with phony numbers, just because it supports what I happen to personally believe.

    Your argument pretty much comes down to two things: the argument from incredulity; and the idea that scientific arguments don't apply. Arguments from incredulity are, to put it mildly, garbage: our imaginations don't define the limits of the universe. And if scientific arguments don't apply, then why bother to try to defend a shoddy pseudo-scientific argument like the one I was criticizing?

    By Blogger MarkCC, at 9:31 AM  

Post a Comment

<< Home