Tuesday, March 14, 2006

Messing with big numbers: using probability badly

There are a lot of really bad arguments out there written by anti-evolutionists based on incompetent use of probability. A typical example is this one. This article is a great example of the mistakes that commonly get made with probability based arguments, because it makes so many of them. (In fact, it makes every single category of error that I list below!)

Tearing down probabilistic arguments takes a bit more time than tearing down the information theory arguments. In the IT arguments, there's really one error that they make, and it's fundamental and unconstestable: they base the whole argument on a fundamental erroneous definition. Once you point out that the definition is wrong, the whole argument collapses.

The probabilistic arguments are different. There isn't one mistake that runs through all the arguments. There's many possibly mistakes, and each argument typically stacks up multiple errors.

For the sake of clarity, I'm going to try to put together a list of the fundamental types of errors; then I'll go through several different articles, and show which errors they each make.

So first, the Good Math/Bad Math Taxonomy of Probability Errors:

Big Numbers

This is the easiest one. This consists of using our difficulty in really comprehending how huge numbers work to say that beyond a certain probability, things become impossible. You can always identify these argument, by the phrase "the probability is effectively zero."

You typically see people claiming things like "Anything with a probability of less than 1 in 10^60 is effectively impossible". It's often conflated with some other numbers, to try to push the idea of "too improbable to ever happen". For example, they'll often throw in something like "the number of particles in the entire universe is estimated to be 3x10^78, and the probability of blah happening is 1 in 10^100, so blah can't happen".

It's easy to disprove. Take two distinguishable decks of cards. Shuffle them together. The likelihood of the resulting deck of shuffled cards having the particular ordering that you just produced is roughly 1 in 10^166. Yeah, there's more possible unique shuffles of two decks of cards than there are particles in the entire universe.

It sure as heck seems like something that unlikely isn't possible. Our intuition says that any probability with a number that big in its denominator is just impossible. Our intuition is wrong - because we're quite bad at really grasping the meanings of big numbers.

Perspective Errors

A perspective error is a relative of big numbers error. It's part of an argument to try to say that the probability of something happening is just too small to be possible. The perspective error is taking the outcome of a random process - like the shuffling of cards that I mentioned above - and looking at the outcome after the fact, and calculating the likelihood of it happening.

Random processes typically have a huge number of possible outcomes. Anytime you run a random process, you have to wind up with one of the outcomes. They're each incredibly unlikely, but you need to wind up with one of them. The probability of getting an outcome is 100%. The probability of your being able to predict which outcome is terribly small. The error here is taking the outcome of a random process which has already happened, and treating it as if you were predicting it in advance.

Combining the probabilities of events can be very tricky, and easy to mess up. It's often not what you would expect. You can make things seem a lot less likely than they really are by making an easy to miss mistakes.

The classic example of this is one that almost every first-semester probability instructor tries in their class. In a class of 20 people, what's the probability of two people having the same birthday? Most of the time, you'll have someone say that the probability of any two people having the same birthday is 1/365^2; so the probability of that happening in a group of 20 is the number of possible pairs over 365^2, or 400/365^2, or about 1/3 of 1 percent.

That's the wrong way to derive it. There's more than one error there, but I've seen three introductory probability classes where that was the first guess. The correct answer is very close to 50%.

Fake Numbers

To figure out the probability of some complex event or sequence of events, you need to know some correct numbers for the basic events that you're using as building blocks. If you get those numbers wrong, then no matter how meticulous the rest of the probability calculation is, the result is garbage. If I say that in rolling a fair die, the odds of rolling a 6 is 1/6th the odds of rolling a one, then I'm not going to get any meaningful predictions of probability. This one is incredibly common in evolution arguments: the initial probability numbers are just pulled out of thin air, with no justification.

Misshapen Search Space

When you model a random process, one way of doing it is by modeling it as a random walk over a search space. Just like the fake numbers error, if your model of the search space has a different shape than the thing you're modeling, then you're not going to get correct results. This is another astoundingly common error in anti-evolution arguments. Evolution does have a directive force shaping the search space: you don't traverse all paths in the search space: survival prunes the space in particular ways. The "search space" of evolution is something like a bumpy surface with some big dents. Roll a marble across that surface, and it's going to move in a particular way with a probability map that's totally different from the probability map of a flat surface.

False Independence

If you want to make something appear less likely than it really is, or you're just not being careful, a common statistical mistake is to treat events as independent when they're not. If two events with probability p_1 and p_2 are independent, then the probability of both p_1 and p_2 is p_1*p_2. But if they're not independent, then you're going to get the wrong answer.

For example, take all of the spades from a deck of cards. Shuffle them, and them lay them out. What are the odds that you laid them out in numeric order? It's 1/13! = 1/6,227,020,800. That's a pretty ugly number. But if you wanted to make it look even worse, you could "forget" the fact that the draws are dependent, in which case the odds would be 1/13^13 - or 1/3x10^14 - about 50,000 times worse.

I'm sure that I'm missing some - I expect this is a post that I'll be coming back to update several times. If you can think of any that I missed, please chime in.

• I think you've hit 'em all - great job!

You might want to give a more evocative description for the "misshapen search space" one, though. The one you've got at the moment (marble rolling over bumpy ground) adequately conveys the idea that something dodgy is afoot, but doesn't give the audience (apart from those who have been debunking Dembski for years) any intuitive idea of why this should be a problem.

The example I like involves a cone, with the pointy end downwards. Stick a marble in this cone. It will roll to the bottom. Pick it out and drop it in again. No matter where you drop it, it'll roll to the exact same point (a specified low-probability event, and hence practically unachievable according to Dembski) every single time.

If you want to make the probability even lower, you can increase the size of the cone, or increase the number of dimensions to turn it into a hypercone*. The marble will still roll to the same point every single time.

* An engineer and a mathematician are sitting in a seminar on quantum mechanics, and the speaker is getting really excited about the concept of 11-dimensional space. The engineer turns to the mathematician and whispers: "good grief! How in tarnation do you imagine 11-dimensional space?"

The mathematician whispers back: "it's easy! Just imagine n-dimensional space and then set n to equal 11!"

By  Lifewish, at 8:52 PM

• I don't see how the example article makes all the mistakes. Big Numbers, Perspective errors, Search space - sure. But he does so much hand waving, it's hard to see what he's doing. It's like no sentence has only one mistake. He teaches basic probability, then doesn't make you use it. He just says that he, or someone else did the work, and here's the answer. It's like he's saying, "see - I know how this works, so you don't have to". This works because most people don't really want to learn anything - they just want to know the relevant results. Implicit here is "I know everything that is known - and here is the executive summary".

His argument about viruses capable of multiple encoding is great. Genetic searches are really good at finding these kinds of solutions. Rather than credit genetic searching, he throws up his hands and says God must have done it. I'd be amazed if the original researcher thought that way. So, he's probably misquoting a source as well. Why make just one mistake when two is twice as good?

By  Stephen, at 11:13 AM

• Stephen: yeah, it's one of those "so bad it's not even wrong" articles. Written in Jello, one might say.

By  Lifewish, at 6:44 PM

• Looks good so far, you'll have to update it as you find more. One other point perhaps related to dealing with really big numbers, when dealing with molecules and search spaces, even really big search spaces are feasibly covered in the timescales we look at over earth history. Christian de Duve has an excellent discussion of this, and of the topics you are discussing in his book "Life Evolving".

By  Markk, at 12:22 AM

• This probably relates to the misshapen search space problem. One error I've seen commonly made is to confuse random (which means nondeterministic) with uniform random (meaning all outcomes equally likely). This confusion usually gets buried into an assumption somewhere leading to calculations that look reasonable but are completely wrong.

By  Anonymous, at 1:08 PM

• Thank you, thank you! The "probablistic" arguments against evolution have bugged me since the first time I read them, but I haven't had the "chops" to refute them well.

It may be a form of the perspective error, but I also see an introduction of a false dependence. That's what I get from "specified complexity" -- the idea that a particular sequence has meaning and therefore the events making up the sequence have some ("intelligently designed?") dependence.

By  ArtK, at 1:46 PM

• Great job! I'm mathematically literate (though surely not an expert) and I get seriously aggravated when I see people abusing math in public. One error that may or may not be covered by "bad combinatorics" is neglect of the Law of Large Numbers, usually aided and abetted by the "effectively zero probability" line, and/or disrespect of "deep time".

Btw, I like to summarize the LLN as "enough tries will shorten a longshot much faster than you'd expect", but that doesn't really give a sense of "just how fast". Can you think of a way to quantify the effects in ordinary language?

By  David Harmon, at 8:37 PM

• Seems to me the example you give for Big Numbers, of shuffling two decks of cards, is really an example of your Perspective Errors. I agree people can't really grasp big numbers--we really aren't built for it. It is really hard to grasp just how long 1 million years really is (much less a billion). I think you really do have two separate error categories here, but I don't think they overlap that much and that the two descriptions are mixed.

By  Anonymous, at 8:02 AM

• I think it is not so much a problem of wrong math. Most of the probabilities stated are actually correct as far as numerics go. It is the interpretation of these numbers that people are confused about. Pop TV's Law & Order, CSI etc often have soundbites that go

"the odds of this DNA belonging to someone other than XXX is 1 in *insert astronomical number here*"

Judges/jurors lean heavily on this information to incarcerate and/or hang people. So ppl tend to associate small odds with the word "certainty". Most ppl who are impressed (after all, if its good enough for the death penalty, its good enough for me) by such humongous numbers forget (perhaps conveniently, but I doubt it) that the odds quoted in a DNA test is related to whether two samples of DNA are the same, not how the DNA got to be that sequence (i.e. consequence and not process).

The primary reason why creationists get lambasted is their play on psychology whilst masquerading as torch bearers of cutting edge science.

Btw, wrt birthday problem, the P(two or more ppl have same birthday|20 ppl total) ~ 40%, not 50%

I should qualify that I'm a creationist by choice, because I believe that life is too beautiful to have come about by dumb chance alone. Just ask the monkeys at http://user.tninet.se/~ecf599g/aardasnails/java/Monkey/webpages/

By  vandice, at 8:29 AM

• I repeated in conversation the fact that the number of combinations of two decks of cards surpasses the number of particles in the universe. And I found this link stating the estimated number of total particles:

So I'm sure this will be simple to answer for you, but being a layman I was unsure how to answer myself: What is the precise definition of a particle in this sense? Obviously, we're not talking about individual atoms, so what is meant by particle in this example?

By  BN, at 2:15 AM

• bn:

In the estimates of the number of particles in the universe, they're talking about all of the basic particles of matter that we really know about: electrons, protons, neutrons, neutrinos, photons, and the various other related subatomic particles.

It doesn't include the so-called "dark matter" which seems likely to exist, because we don't have a clue of what kinds of particles make up dark matter, or how many of them there are.

By  MarkCC, at 8:38 AM

• AI researcher Eli Yudkowsky has written some interesting things about probability, so I thought I'd toss in a few links here that may be of interest:

An Intuitive Explanation of Bayesian Reasoning

A Technical Explanation of Technical Explanation

The Simple Truth

By  Anonymous, at 10:18 PM

• I suppose you could view this as a combination of the other mistakes (different mistakes in different instances) but it's so common (and annoying) that I have to post it. Besides, it has an authoritative sounding name: the Law Of Averages. For example, if you flip a coin 5 times and get 5 heads then the law of averages says you have a high probability of getting a tails on the 6th flip.

Another example that is a little more interesting comes from baseball. If a player fails to get a hit for 10 straight games but his batting average for the season is .300 then the law of averages implies he is likely to get a hit on his next at-bat.

By  Anonymous, at 5:30 PM