Good Math/Bad Math: Berlinski's Bad Math

Thursday, March 30, 2006

Berlinski's Bad Math

In my never-ending quest for bad math to mock, I was taking a look at the Discovery Institute's website, where I found an essay, On the Origin of Life, by David Berlinksi. Bad math? Oh, yeah. Bad, sloppy, crappy math. Some of which is just duplication of things I've criticized before, but there's a few different tricks in this mess.

Before I jump in to look at a bit of it, I'd like to point out a general technique that's used in this article. It's very wordy. It rambles, it wanders off on tangents, it mixes quotes from various people into its argument in superfluous ways. The point of this seems to be to keep you, the reader, somewhat off balance; it's harder to analyze an argument when the argument is so scattered around, and it's easier to miss errors when the steps of the argument are separated by large quantities of cutesy writing. Because of this, the section I'm going to quote is fairly long; it's the shortest I could find that actually contained enough of the argument I want to talk about to be coherent.

The historical task assigned to this era is a double one: forming chains of nucleic acids from nucleotides, and discovering among them those capable of reproducing themselves. Without the first, there is no RNA; and without the second, there is no life.

In living systems, polymerization or chain-formation proceeds by means of the cell’s invaluable enzymes. But in the grim inhospitable pre-biotic, no enzymes were available. And so chemists have assigned their task to various inorganic catalysts. J.P. Ferris and G. Ertem, for instance, have reported that activated nucleotides bond covalently when embedded on the surface of montmorillonite, a kind of clay. This example, combining technical complexity with general inconclusiveness, may stand for many others.

In any event, polymerization having been concluded—by whatever means—the result was (in the words of Gerald Joyce and Leslie Orgel) “a random ensemble of polynucleotide sequences”: long molecules emerging from short ones, like fronds on the surface of a pond. Among these fronds, nature is said to have discovered a self-replicating molecule. But how?

Darwinian evolution is plainly unavailing in this exercise or that era, since Darwinian evolution begins with self-replication, and self-replication is precisely what needs to be explained. But if Darwinian evolution is unavailing, so, too, is chemistry. The fronds comprise “a random ensemble of polynucleotide sequences” (emphasis added); but no principle of organic chemistry suggests that aimless encounters among nucleic acids must lead to a chain capable of self-replication.

If chemistry is unavailing and Darwin indisposed, what is left as a mechanism? The evolutionary biologist’s finest friend: sheer dumb luck.

Was nature lucky? It depends on the payoff and the odds. The payoff is clear: an ancestral form of RNA capable of replication. Without that payoff, there is no life, and obviously, at some point, the payoff paid off. The question is the odds.

For the moment, no one knows how precisely to compute those odds, if only because within the laboratory, no one has conducted an experiment leading to a self-replicating ribozyme. But the minimum length or “sequence” that is needed for a contemporary ribozyme to undertake what the distinguished geochemist Gustaf Arrhenius calls “demonstrated ligase activity” is known. It is roughly 100 nucleotides.

Whereupon, just as one might expect, things blow up very quickly. As Arrhenius notes, there are 4^100 or roughly 10^60 nucleotide sequences that are 100 nucleotides in length. This is an unfathomably large number. It exceeds the number of atoms contained in the universe, as well as the age of the universe in seconds. If the odds in favor of self-replication are 1 in 10^60, no betting man would take them, no matter how attractive the payoff, and neither presumably would nature.

“Solace from the tyranny of nucleotide combinatorials,” Arrhenius remarks in discussing this very point, “is sought in the feeling that strict sequence specificity may not be required through all the domains of a functional oligmer, thus making a large number of library items eligible for participation in the construction of the ultimate functional entity.” Allow me to translate: why assume that self-replicating sequences are apt to be rare just because they are long? They might have been quite common.

They might well have been. And yet all experience is against it. Why should self-replicating RNA molecules have been common 3.6 billion years ago when they are impossible to discern under laboratory conditions today? No one, for that matter, has ever seen a ribozyme capable of any form of catalytic action that is not very specific in its sequence and thus unlike even closely related sequences. No one has ever seen a ribozyme able to undertake chemical action without a suite of enzymes in attendance. No one has ever seen anything like it.

The odds, then, are daunting; and when considered realistically, they are even worse than this already alarming account might suggest. The discovery of a single molecule with the power to initiate replication would hardly be sufficient to establish replication. What template would it replicate against? We need, in other words, at least two, causing the odds of their joint discovery to increase from 1 in 10^60 to 1 in 10^120. Those two sequences would have been needed in roughly the same place. And at the same time. And organized in such a way as to favor base pairing. And somehow held in place. And buffered against competing reactions. And productive enough so that their duplicates would not at once vanish in the soundless sea.

In contemplating the discovery by chance of two RNA sequences a mere 40 nucleotides in length, Joyce and Orgel concluded that the requisite “library” would require 10^48 possible sequences. Given the weight of RNA, they observed gloomily, the relevant sample space would exceed the mass of the earth. And this is the same Leslie Orgel, it will be remembered, who observed that “it was almost certain that there once was an RNA world.”

To the accumulating agenda of assumptions, then, let us add two more: that without enzymes, nucleotides were somehow formed into chains, and that by means we cannot duplicate in the laboratory, a pre-biotic molecule discovered how to reproduce itself.

Ok. Lots of stuff there, huh? Let's boil it down.

The basic argument is the good old "big numbers". Berlinski wants to come up with some really whoppingly big numbers to make things look bad. So, he makes his first big numbers appeal by looking at polymer chains that could have self-replicated. He argues (not terribly well) that the minimum length for a self-replicating polymer is 100 nucleotides. From this, he then argues that the odds of creating a self-replicating chain is 1 in 10^60.

Wow, that's a big number. He goes to some trouble to stress just what a whopping big number it is. Yes Dave, it's a big number. In fact, it's not just a big number, it's a bloody huge number. The shame of it is, it's wrong; and what's worse, he knows it's wrong. Right after he introduces it, he quotes a biochemist who pointed out the fact that that's stupid odds because there's probably more than one replicator in there. In fact, we can be pretty certain that there's more than one: we know lots of ways of modifying RNA/DNA chains that doesn't affect their ability to replicate. How many of those 10^60 cases self-replicate? We don't know. Berlinski just handwaves. Let's look again at how he works aorund that:

They might well have been. And yet all experience is against it. Why should self-replicating RNA molecules have been common 3.6 billion years ago when they are impossible to discern under laboratory conditions today? No one, for that matter, has ever seen a ribozyme capable of any form of catalytic action that is not very specific in its sequence and thus unlike even closely related sequences. No one has ever seen a ribozyme able to undertake chemical action without a suite of enzymes in attendance. No one has ever seen anything like it.

So - first, he takes a jump away from the math, so that he can wave his hands around. Then he tries to strenghten the appeal to big numbers by pointing out that we don't see simple self-replicators in nature today.

Remember what I said in my post about Professor Culshaw, the HIV-AIDS denialist? You can't apply a mathematical model designed for one environment in another environment without changing the model to match the change in the environment. The fact that it's damned unlikely that we'll see new simple self-replicators showing up today is irrelevant to discussing the odds of them showing up billions of years ago. Why? Because the environment is different. In the days when a first self-replicator developed, there was no competition for resources. Today, any time you have the set of precursors to a replicator, they're part of a highly active, competitive biological system.

And then, he goes back to try to re-invoke the big-numbers argument by making it look even worse; and he does it by using an absolutely splendid example of bad combinatorics:

The odds, then, are daunting; and when considered realistically, they are even worse than this already alarming account might suggest. The discovery of a single molecule with the power to initiate replication would hardly be sufficient to establish replication. What template would it replicate against? We need, in other words, at least two, causing the odds of their joint discovery to increase from 1 in 10^60 to 1 in 10^120. Those two sequences would have been needed in roughly the same place. And at the same time. And organized in such a way as to favor base pairing. And somehow held in place. And buffered against competing reactions. And productive enough so that their duplicates would not at once vanish in the soundless sea.

The odds of one self-replicating molecule developing out of a soup of pre-biotic chemicals is, according to Berlinksi, 1 in 10^60. But then, the replicator can't replicate unless it has a "template" to replicate against; and the odds of that are also, he claims, 1 in 10^60. Therefore, the probability of having both the replicator and the "template" is the product of the probabilities of either one, or 1 in 10^120.

Only that product formulation for combining probabilities only works if the two events are completely independent. They aren't. If you've got a soup of nucleotides and polymers, and you get a self-replicating polymer, it's in an environment where the "target template" is quite likely to occur. So the odds are not independent - and so you can't use the product rule.

Not to mention that he repeats the same error he made before: assuming that there's exactly one "template" molecule that can be used for replication.

And even that is just looking at a tiny aspect of the mathematical part: the entire argument about a template is a strawman; no one argues that the earliest self-replicator could only replicate by finding another perfectly matched molecule of exactly the same size that it could reshape into a duplicate of itself.

And finally, he rehashes his invalid-model argument: because we don't see primitive self-replicators in todays environment, that must mean that they were unlikely in a pre-biotic environment.

This is what mathematicians call "slop", also known as "crap". Bad reasoning, fake numbers pulled out of thing air, assertions based on big numbers, deliberately using wrong numbers, invalid combinatorics, and misapplication of models. It's hard to imagine what else he could have gotten wrong.

16 Comments:

"the minimum length for a self-replicating polymer is 100 nucleotides. "

Wasn't the most recent Speigelman's monster (i.e. a real world, actually existing self-reproducer/evolver) down to around 40 or so nucleotides? Sure, it still requires a protein catalyst in with it to reproduce, but it's still not implausible that some variation on SM could be self-catalyzing.

By Anonymous, at 2:17 PM
Thanks for the info. It's no big surprise that Berlinski's chemistry is as bad as his math :-)

I'm not a chemist, so I didn't want to make any invalid claims about how small a replicator could get; I just focused on the side I know best, which involves the fact that he just cooks up numbers to give the effect that he wants - that is, that there's exactly *one* self-replicator, and it can't be shorter than 100 nucleotides. There's no reason to argue those specific numbers *except* to pull off a big-numbers argument.

By MarkCC, at 2:24 PM
"the replicator can't replicate unless it has a "template" to replicate against"

I'm not quite sure how this "template" fits into things. Aren't single-stranded RNA, DNA strands capable of reproducing themselves? Isn't that how it happens in PCR?

By Anonymous, at 3:11 PM
Did he spell his name right at the top?

Just checking.

DJ

By Anonymous, at 4:48 PM
Why don't you skip the rhetoric and just show the math "errors". If your readers can't examine Berlinski's argument for themselves and need your overblown pap for commentary on Berlinski, then one can only conclude that Intelligently Designed organisms don't necessarily need to produce Intelligently Designed rebuttals. Smear campaigns are more effective. But it looks like you've already done the math; am I right, Godel?

By Anonymous, at 5:22 PM
bc:

I don't really understand his template thing either. I think that the idea is that for a replicator to be successful, it needs to have the material that it can produce replicas from. Berlinski is, I think, trying to argue that instead of being able to replicate from a pool of basic organic chemicals, that a replicator needs a single, specific precursor.

There's absolutely no reason to believe that there's any need for a specific, complex precursor for the replicator to use as raw material for copying itself; I think he introduced it specifically so that he could make the "big numbers" argument stronger - 1 in 10^60 is good, but 1 in 10^120 is better.

By MarkCC, at 6:05 PM
anonymous:

Touchy, aren't we?

Berlinksi's article has plenty of snark in it; and it purports to be a scientific article, one of the articles that DI chose to represent the whole ID movement.

This, on the other hand, is a blog with no pretentions. It's what I feel like writing, the way I feel like writing it. And if someone like Berlinski wants to write verbose garbage and try to pass it off as mathematical analysis, well, I'm gonna snark.

If you don't like the way the blog is written, just don't spend your time reading it. If you have any actual content issues to point out, I'll be glad to discuss/debate with you.

By MarkCC, at 6:10 PM
I am currently trying to read Berlinksi's A Tour of the Calculus. It's heavy going, full of pretentious writing and pedantry. Ugh! I've seen some of Berlinski's arguments against evolution and I know he's a gasbag, but I thought he should at least be competent at math. (His Ph.D. in math is from Princeton, after all, not a diploma mill.) Instead, however, it seems he prefers to browbeat people with his vast intellect, using mathematical knowledge as a blunt instrument. I wrote a blog post about his performance during the Firing Line debate on creation versus evolution, during which he was quite obnoxious. My article is David Berlinski vs. Goliath (This time Goliath wins).

By Zeno, at 10:27 PM
bc:

In the modern world, neither DNA nor RNA is generally capable of reproducing itself.

DNA is reproduced by proteins--DNA polymerases--both in vivo and in PCR. In PCR, it's specifically carried out by DNA polymerases from a thermophilic halobacteria, which is the key to its success.

RNA is never (maybe only almost never? It's late and I'm tired) reproduced in the form of RNA->RNA, but rather in the form of RNA->DNA->RNA in the limited case of retroviri. RNA in living creatures doesn't reproduce, it's synthesized by transcription from DNA, which is carried out by proteins--RNA polymerases.

The RNA world is based on ribozymes: catalytic molecules composed mostly to entirely of RNA. Since DNA has only poor catalytic activities, and protein is not good at acting as a hereditary molecule (and performs no such function today), ribozymes are a useful compromise that could perform both functions; albeit poorly. To the best of my knowledge, all ribozymes capable of useful self-replication are synthetic.

That was WAY too much info, wasn't it?

By The Neurophile, at 12:25 AM
I'm enjoying your blog, thanks for writing it. The crusade against bad math is entertaining, but it would be nice if you could broaden your scope a little bit beyond creationists. They seem like too easy pickings for someone of your intellect.

Does anybody intelligent enough to read this blog really believe that ID nonsense?

By p, at 5:38 AM
Quoth anonymous: "Why don't you skip the rhetoric and just show the math 'errors'."

His rhetoric or Berlinski's?

Look, sometimes the math itself isn't the story. I can easily write an essay that boils down to:

2+2=4
Therefore, evolution is true.

The natural response is to question what the premise has to do with the conclusion. The rhetorical strategy is to wrap the whole essay in piles of wordage that makes it seem plausible that the premise somehow *is* connected to the conclusion.

And then, when someone attacks the mounds of steaming wordage, to ask him why he's not dealing with the math, or if he really thinks 2+2<>4.

Mark's essay is focused on pointing out that Berlinski's argument essentially is:

10^60 is a really big number.
Therefore, evolution is false.

A task at which it succeeds admirably ...

By Anonymous, at 11:31 PM
"the replicator can't replicate unless it has a "template" to replicate against"

I'm not quite sure how this "template" fits into things. Aren't single-stranded RNA, DNA strands capable of reproducing themselves? Isn't that how it happens in PCR?

Yes, but in PCR you have a polymerase enzyme that is sticking the free nucleotides to the template, to produce the complementary strand.

I think Berlinski is saying that if we assume the first replicator was a ribozyme (RNA that can act as an enzyme), then you need one 100-bp ribozyme (unfolded) as the template, and a second 100-bp ribozyme to act as the initial ribozyme.

In addition to the problems with Berlinksi's probability calculation here pointed out in the opening post, there are others:

1. Not only are there perhaps many 100-bp sequences that would work, you wouldn't have to get the exact same sequence twice, you could have 2 different ribozyme sequences.

2. There might be inorganic catalysts that play a role

3. The original replicator might have been a simpler form of nucleic acid, not RNA

4. It's not clear that the original replicator would have made use of 4 bases

5. Even more importantly, I think recent studies have indicated that very short RNAs can have catalytic activity, and that an assemblage of short RNAs with various functions is much more likely than hitting one huge RNA chain by chance. This also ameliorates error catastrophe problems.

6. Berlinksi ignores passive template-replication situations (e.g. wet-dry or freeze-thaw cycles) that might in the right conditions replicate initially completely function*less* (!) short nucleic acid chains. This might be slow -- e.g. one replication per day -- but in a lifeless world, there is nothing better to outcompete this slow replication. If a system like this got going, then selection would favor "functionless" sequences that replicated faster or better for some reason.

E.g., let me know when an IDer or Berlinski-esque person addresses a study like this:

Alexander V. Vlassov, Sergei A. Kazakov, Brian H. Johnston, and Laura F. Landweber (2005). "The RNA World on Ice: A New Scenario for the Emergence of RNA Information." Journal of Molecular Evolution, 61(2), pp. 264-273.

Conclusion

One of the problems with the RNA world theory is how complex RNAs could evolve and survive on the early Earth, given that the RNA is rapidly degraded under conditions in which it normally functions. The recent discoveries of several very small ribozymes and the finding that short RNAs can catalyze ligation of RNA fragments under conditions that greatly inhibit random degradation present a tantalizing solution to this problem. The existence of such ligases provides a much more efficient path to the formation of more complex RNAs than stepwise polymerization (Schmidt 1999). Importantly, under freezing temperatures, the base pairing required between substrate and catalytic sequences is minimal, to the point that a simple fragment of the HPR could ligate any RNA with a 5'-OH to a given fragment with a 2',3'-cyclic phosphate (Vlassov et al. 2004). This system comes close to providing a "universal ligase" that can assemble random fragments of RNA into more complex molecules under highly stabilizing conditions. It is not hard to envisage random polymerization ofacti vated mononucleotides creating small fragments with some catalytic activity, followed by assembly of large rRNAs by ligation and the eventual emergence of sufficient catalytic power that the larger molecules could survive under higher-temperature, less protected conditions.

Nick Matzke

By Anonymous, at 9:55 PM
Incidentally, the following statements are wrong:
No one, for that matter, has ever seen a ribozyme capable of any form of catalytic action that is not very specific in its sequence and thus unlike even closely related sequences. No one has ever seen a ribozyme able to undertake chemical action without a suite of enzymes in attendance.
In several ribozymes, all that's needed for certain structural sequences is proper base pairing of a region. In other words, the sequence doesn't matter so long as it can match with another region of the ribozyme.

There are also clearly ribozymes which catalyze reactions without the need for "a suite of enzymes in attendance." That's what got Tom Cech a Nobel Prize, after all.

So, in addition to the math, he seems to have gotten his basic facts wrong.

By Anonymous, at 10:46 PM
Just a minor point, but Berlinski is wrong to say that 10^60 is more than the number of atoms in the universe. That number is over 10^78. But what's 10^18 between friends?

By Anonymous, at 1:27 AM
By The Neurophile of Your Destruction, at 12:25 AM:

In PCR, it's specifically carried out by DNA polymerases from a thermophilic halobacteria, which is the key to its success.

You forgot the thermal cycling bit, which is another key to its success, and one that could be relevant in a thermal vent or geyser scenario.

RNA is never (maybe only almost never? It's late and I'm tired) reproduced in the form of RNA->RNA

Yes, you must be tired. There are a variety of modes of replication seen in present-day viruses.

To the best of my knowledge, all ribozymes capable of useful self-replication are synthetic.

Which says what about the situation of 4 billion years ago?

By Anonymous, at 12:29 PM
Some comments from a CompSci geezer:

1: To paraphrase Murhy: if self-replicating molecules can exist, they will. Once it happens, a self-perpetuating phase change is immediately precipitated.

2: His number may be large, but he is applying it to a single experiment. In fact, the experiment was occurring in a massively parallel system otherwise known as "Earth". Umpty-trillion parallel tests. So divide his exponent by that and you get a much smaller number.

3: (and this one has been noticed by other commentators) he puts the solution space for successful nucleotides at containing 10^60 members. Then for who knows what reason he takes this number as the probability of any one of them spontaneously appearing. ?wtf?

By Anonymous, at 9:55 PM

Good Math/Bad Math

Thursday, March 30, 2006

Berlinski's Bad Math

16 Comments:

About

About Me

Previous