Good Math/Bad Math

Thursday, April 06, 2006

Berlinski responds

David Berlinski replied to my post about his sloppy math. He claims to have had difficulty posting comments on the blog (anyone else ever had any trouble posting here?), so I'm doing him the favor of posting his comments, followed by my response.

Here's David:



Dear Mark Chu-Carroll --

I have read your floridly indignant comments about my essay, 'On the
Origins of Life,' with great interest. I am responding by e-mail because
it is apparently impossible for me to post to your site. As I have
indicated on any number of occasions, the improbabilities that I cited
are simply those that are cited in the literature. The combinatorial
calculations I made were both elementary and correct. There is no slop,
excess, extrapolation or hand waving. Four nucleotides arranged in a
sequence one hundred nucleotides in length create a sample space of four
raised to the one hundredth power. Given the chemical structure of RNA,
in which nucleotides are bound to a sugar phosphate backbone but not to
one another, independence with respect to template formation is not only
reasonable as an assumption, but inevitable. This is again a point made
with absolute explicitness in the literature itself. It may well be that
as Arrhenius speculates, strict sequence specificity is not in the end
needed to obtain demonstrable ligase activity. There may be many
sequences in the pre-biotic environment capable of carrying out various
chemical activities. This is a point I made explicitly in my essay, but
from the point of view of the probabilities involves, it has nothing
whatsoever to do with independence of the requisite events.

If you check the index to my A TOUR OF THE CALCULUS under 'limits' you
will find page references to a complete and precise definition. The
section in Prague was, of course, intended to be dream-like. It is not
the mathematicians who are diminishing in sophistication, but the narrator.

DB


My response:


Let's drop the nonsense, shall we? We both know perfectly well that you're playing games, and I'm not interested in wasting time on gibberish.

First - no one else has ever had trouble posting comments to my blog. The only filtering done is a trivial spam filter which anyone who is capable of reading can cope with. So let's just say that I'm skeptical; I think you probably would just rather not allow your response to be public. However, since in your note you claim that you wanted to post this, I'll just go ahead and post your note, and my response.

Second - you're continuing to play rhetorical games, and avoid addressing any of the real issues I have with your misuse of mathematics. To remind you, the issues were:

  1. You create an arbitrary and unsupported requirement for replicator length.
  2. You insist, without support, that there is exactly one replicator of that length.
  3. You insist, without support, that the first replicator required a specific replication template - and that there was exactly one suitable template.
  4. You insist, without support, that in an environment in which the precursors for a replicator can be found, that the odds of creating similar chains are indepdendent.

Your response to number one is "I didn't do it, I just copied it from someone else". That's just pathetic. If you want to make the argument, you should be ready to defend its validity, not just pass it off one someone else.

You don't respond to number two - and that's a big issue. In your space of 10^60 alleged possibilities, there may be 1 replicator; there may be 10^40. By not addressing that, you make your probability calculation utterly worthless.

Your response to number three is handwaving - again, ignoring the math. What are the properties that your "template" needs? How can we determine how many "templates" can be used by a particular replicator? Again, if you can't determine how many templates there are among your alleged 10^60 possibilitites, then your probability calculations are worthless.

Your response to number four is yet more handwaving: if there is an environment in which nucleotide chains are being developed, why is it impossible (or even improbable) that similar chains are being formed? You insist upon a uniform distribution of the various chains of your desired length - but you provide no reason to support that uniformity. You insist that two the probabilities of two related chemicals developing in proximity to each other in the same environment are entirely unrelated. That's a hell of an assumption, which you do not support. If you cannot demonstrate independence, then you cannot assume it in your mathematics.

With respect to your comments about your calculus book - that's from a comment written by another commentor on my blog; I do not have a copy of your book, nor do I want one. I have not and will not comment on anything relating to your calculus book since I have not read it. I do not modify the comments on my blog; if you have an issue with them, you can take it up with the the author of the comment. You're welcome to start a discussion with him in the comments; or you can go directly to his website to discuss it with him.

19 Comments:

  • Oh no, a blog war.

    By Blogger Thomas Winwood, at 10:54 AM  

  • thomas:

    I'll make sure it doesn't turn into a huge blogwar; any further dialog between Berlinksi and me with take place in the comments on this post, so it shouldn't take over the blog, and aside from this one front-page post, it shouldn't be visible unless you want to look in the comments.

    By Blogger MarkCC, at 11:21 AM  

  • Paris
    6 April, 2006

    I have corrected a few trivial spelling errors in your original posting, and I have taken the liberty of numbering comments:


    (1) You create an arbitrary and unsupported requirement for replicator length.

    (2) You insist, without support, that there is exactly one replicator of that length.

    (3) You insist, without support, that the first replicator required a specific replication template - and that there was exactly one suitable template.

    (4) You insist, without support, that in an environment in which the precursors for a replicator can be found, that the odds of creating similar chains are independent.

    5) Your response to 1) is "I didn't do it, I just copied it from someone else". That's just pathetic. If you want to make the argument, you should be ready to defend its validity, not just pass it off one someone else.

    6) You don't respond to number two - and that's a “big” issue. In your space of 1060 alleged possibilities, there may be 1 replicator; there may be 1040. By not addressing that, you make your probability calculation utterly worthless.

    7) Your response to number three is hand waving - again, ignoring the math. What are the properties that your "template" needs? How can we determine how many "templates" can be used by a particular replicator? Again, if you can't determine how many templates there are among your alleged 10^60 possibilities, then your probability calculations are worthless.

    8) Your response to number four is yet more handwaving: if there is an environment in which nucleotide chains are being developed, why is it impossible (or even improbable) that similar chains are being formed?

    9) You insist upon a uniform distribution of the various chains of your desired length - but you provide no reason to support that uniformity.

    10) You insist that two the probabilities of two related chemicals developing in proximity to each other in the same environment are entirely unrelated. That's a hell of an assumption, which you do not support.

    11) If you cannot demonstrate independence, then you cannot assume it in your mathematics.


    I discuss these points seriatim:

    1a) The requirement for replicator length is neither arbitrary nor unsupported. Turning to the second matter first:

    It is not unsupported because I have supported it, and this by citing Gustave Arrhenius, Leslie Orgel & Gerald Joyce (Arrhenius, G., ‘Life out of Chaos,’ in Fundamentals of Life, G. Palyi, Ed., Paris, 2002, 203-210 and Joyce, G.F., & Orgel, L.E., ‘Prospects for Understanding the Origin of the RNA World,’ in The RNA World, 2nd edition, Eds. R. Gesteland, T. Cech, J. Atkins, Cold Spring Harbor Laboratory Press, 1999, 48-77). You may certainly argue that my sources are in error; and I encourage you to do so, but your remarks – they are right in front of your nose – convey the unhappy impression that I simply created numbers out of thin air. Now you know better.

    But neither are my calculations arbitrary:

    It is perfectly true that no one knows the minimum ribozyme length for demonstrable replicator activity. I said as much in my essay. But the figure of the 100 base pairs required for what Arrhenius calls “demonstrable ligase activity,” is known; it is current in the literature; and it is, all evidence suggests, an under-estimate. In this regard, see Arrhenius’ own source: Ekland, E.H., Szotak, J.W., Bartel, D.P. ‘Structurally complex and highly active RNA ligases derived from random RNA sequences,’ Science 269, 364-370, 1995. No one has successfully demonstrated a markedly lower length for independent ligase activity under laboratory conditions. In this regard, I encourage you to consider Natasha Paul & Gerald Joyce’s ‘A self-replicating ligase ribozyme (Paul, N., Joyce, G., PNAS, Volume 99, No. 20, 2002). The demonstration reported, as Paul & Joyce make clear, was enzymatically driven; and so not properly speaking relevant to pre-biotic chemistry. More to the point, Paul & Joyce comment on the possibility of an in vitro demonstration of a “nucleic acid (or protein) enzyme that catalyses the replication of many different nucleic acid (or protein) molecules, including copies of itself.” They report “substantial progress … along these lines,” – true enough, as their references indicate – but at once add with respect to the laboratory ribozyme in question that “it contains 200 nt, so it is not nearly capable of catalyzing the synthesis of additional copies of itself.” My estimate of one hundred nucleotides as the minimum needed for demonstrable ligase activity was a gross under-estimate of the length required of a true self-replicating ribozyme. It might well serve as a lower bound; this would, of course, strengthen and not weaken my argument. Were one to consider the requirement that replication must be autocatalytic in order to protect emerging replicators from dilution effects, the odds in question would diminish further. Regiospecificity imposes yet another probabilistic obstacle. Small wonder, then, that Arrhenius, citing Joyce & Orgel in agreement, remarks that “impossibly large improbabilities of this kind loom as a universal threat in the chaotic era of chemical evolution,” (Arrhenius, op. cit. p. 4). The specific solutions that Arrhenius considers, it is worth noting – “special reaction conditions, favorable catalytic substrate effects, or mineral surfaces,” -- have in their turn been criticized by Orgel as unrealistic, indeed, as virtually impossible (Orgel, L., ‘Self-Organizing biochemical cycles,’ PNAS, November 7, 2000, Vol. 97, no. 23, 12503-12507).

    2a) On the contrary. Following Arrhenius, I entertain the possibility that sequence specificity may not, after all, be a necessary condition for demonstrable ligase activity -- or any other biological function, for that matter. I observed -- correctly, of course -- that all out evidence is against it. All evidence – meaning laboratory evidence; all evidence – meaning our common experience with sequence specificity in linguistics or in any other field in which an alphabet of words gives rise to a very large sample space in which meaningful sequences are strongly non-generic – the space of all proteins, for example.

    3a) I cited Joyce & Orgel's calculations; they observed, correctly, that given our present understanding, an initial replicator must have required a replication template, and they carried out the elementary calculation needed to assess the probability of those events occurring independently. We have no evidence whatsoever that an ancestral ribozyme could have carried out replication without a template. Non-template directed replication would by its nature have been too slovenly and imprecise to protect against various error catastrophes. If replication proceeded by Watson-Crick base pairing, then obviously there was only one suitable template. The assumption that replication did proceed by Watson-Crick base pairing is, as I have observed, one of the crucial assumptions of the RNA-world hypothesis, and as Joyce & Orgel note explicitly, accepted by virtually everyone in the field.

    4a) The phrase “the precursors of a replicator” is incoherent if you are talking of the first replicator, and irrelevant if not. The first replicators, by definition, had no precursors. For Watson-Crick base pairing to proceed in the pre-biotic era, replicators must be more than similar: They must be identical up to Watson-Crick base-pairing.

    There remains the issue of independence. Independence is, of course, the de facto hypothesis in probability calculations; and in the case of pre-biotic chemistry, strongly supported by the chemical facts. You are not apt to dismiss, I suppose, the hypothesis that if two coins are flipped the odds in favor if seeing two heads is one in four on the grounds that, who knows?, the coins might be biased. Who knows? They might be. But the burden of demonstrating this falls on you.

    5a) There are two issues here: The first is the provenance of my argument; the second, my endorsement of its validity. You have carelessly assumed that arguments I drew from the literature were my own invention. This is untrue. I expect you to correct this misunderstanding as a matter of scholarly probity.

    As for the second point, it goes without saying that I endorsed the arguments that I cited. Why on earth would I have cited them otherwise?

    6a) An affirmation does not become more credible by being repeated emphatically, and if what I have said is worthless, then presumably it is utterly worthless as well. It is perfectly true that there may be 1040 replicators in the space of all possible sequences forty monomers in length and drawn from a four letter alphabet. It is also perfectly possible that every possible sequence of amino acids has a biological role to play. In pre-biotic chemistry, however, we are dealing with what is the case, and not with what may be the case. There is no evidence remotely suggesting that these possibilities are true; and, of course, all evidence with respect to comparable issues of sequence specificity is against it. Following your lead, please allow me to be even more emphatic. No chemist on the face of the earth has argued that in the space of 10 to the 60th power, there could possibly be anything like 10 to the 40th replicators. Quite the contrary. Everything we know of the space of such sequences suggests strongly that demonstrable ligase activity, let alone replication, is remarkably rare, and that it is biological uselessness -- the molecule does nothing -- that is in fact generic. Bio-molecules are immensely specific everywhere we look: Why should the case of a replicating ribozyme be different? Your argument resembles nothing so much as the claim that no account is needed for the fact that your e-mail consists of meaningful English sentences because, after all, it's possible that most sequences drawn on an alphabet of, say, twenty five English words are apt to be meaningful. Possible? Of course. We are not discussing a matter of logic. But in plain fact, not so. The set of grammatical English sentences is tiny subset of the set of possible word-like sequences; so too the set of functioning ribozymes in the space of all possible RNA sequences of the same length.

    7a) Of these questions, the first is unanswerable: No one knows. The second is trivial: A replicator proceeding by Watson-Crick base pairing requires one and only one template. The statement that follows is confused. The possibilities I specified are not alleged: They are deduced from the elementary calculation that four nucleotides arranged in sequences one hundred nucleotides in length will create a sample space of precisely four to the hundredth power. What in this so troubles you that you scruple at the plain mathematical facts by writing ‘alleged’? The claim that these considerations are worthless would seem to depend on your assumption that it is impossible to answer the question of how many templates there might have been in the pre-biotic. Not at all. I’ve just answered that question in my response to your second question. There must have been at least one, and the odds in favor of its formation are the same as the odds favoring its template. We have no reason to think otherwise.

    8a) It is not impossible, but it is improbable. The improbability follows from the basic facts of polymerization chemistry with respect to the polynucleotides: to wit: Polymerization is sequence independent, and thus incapable of distinguishing between chains on the basis of nitrogenous bases. Bonding takes place on the sugar-phosphate backbone of a nucleotide, and nowhere else. This is an observation that in the second of their two great papers, ‘Genetical Implications of the Structure of Deoxyribose Nucleic Acid,’ (Nature 171, 964-967, 1953) Watson and Crick made explicitly. “The phosphate-sugar backbone of our model is completely regular,” they wrote, “but any sequence of the pairs of bases may fit into the structure. It follows that in a long molecule many different permutations are possible … (emphasis added).” More than a fifty years of biochemical analysis has more than confirmed this conjecture. And more. All permutations, we now know, are possible, at least from the point of view of nucleic acid chain construction and elementary chemistry. Other factors may be involved; this is always true, and always uninteresting.

    9a) If by a uniform distribution you mean a uniform probability distribution, then I made no remarks about such distributions in my paper, but I would certainly argue what is in any case obvious, namely that longer polynucleotide chains of whatever degree of specificity are less likely to appear in the pre-biotic era than shorter chains, simply because polymerization is an uphill reaction.

    10a) Two related chemicals? Did you mean two related molecules? And if so, did you mean two related polynucleotides? I shall assume so. But not knowing how those polynucleotides are supposed to be related, your question as posed is unanswerable. All polynucleotides are, after all, related to some degree: They are polynucleotides.

    Independence is not only obvious but overwhelmingly so; and as I have already remarked, follows simply from the sequence independence of polymerization itself. Could other factors be at work that have nothing to do with polymerization? Sure, who knows? But it is hardly my responsibility to consider factors that neither you nor anyone else can cite.

    11a) It is almost never possible in an experimental setting to demonstrate independence; the standard is simply that independence is the assumption made in the absence of evidence to the contrary.

    I know you will be exhilarated by my comments.

    DB

    By Anonymous David Berlinski, at 3:50 PM  

  • My, that was long, but I'll admit, it's easier to write a long response to criticism than a short one.

    As I've said before, I think that there are a few kinds of fundamental errors that you make repeatedly; and I don't think your comments really address them in a meaningful way. I'm going to keep this as short as I can; I don't like wasting time rehashing the same points over and over again.

    With regard to the basic numbers that you use in your probability calculations: no probability calculation is any better than the quality of the numbers that get put into it. As you admit, no one knows the correct length of a minimum replicator. And you admit that no one has any idea how many replicators of minimum or close to minimum length there are - you make a non-mathematical argument that there can't be many. But there's no particular reason to believe that the actual number is anywhere close to one. A small number of the possible patterns of minimum length? No problem. *One*? No way, sorry. You need to make a better argument to support eliminating 10^60 - 1 values. (Pulling out my old favorite, recursive function theory: the set of valid turing machine programs is a space very similar to the set of valid RNA sequences; there are numerous equally valid and correct universal turing machine programs at or close to the minimum length. The majority of randomly generated programs - the *vast* majority of randomly generated programs - are invalid. But the number of valid ones is still quite large.)

    Your template argument is, to be blunt, silly. No, independence is not the de facto hypothesis, at least not in the sense that you're claiming. You do not get to go into a probability calculation, say "I don't know the details of how this works, and therefore I can assume these events are independent." You need to eliminate dependence. In the case of some kind of "pool" of pre-biotic polymers and fragments (which is what I meant by precursors), the chemical reactions occuring are not occuring in isolation. There are numerous kinds of interactions going on in a chemically active environment. You don't get to just assume that those chemical interactions have no effect. It's entirely reasonable to believe that there is a relationship between the chains that form in such an environment; if there's a chance of dependence, you cannot just assume independence. But again - you just cook the numbers and use the assumptions that suit the argument you want to make.

    By Blogger MarkCC, at 5:37 PM  

  • Contrast DB:
    The first replicators, by definition, had no precursors.

    To MCC:
    In the case of some kind of "pool" of pre-biotic polymers and fragments (which is what I meant by precursors)

    DB is displaying miracle-based thinking- the first X appears out of nowhere. MCC is displaying physical thinking: the first molecules capable of replicating themselves arise from a pool of molecules which are chemically active but not self-replicating. E.g. a specific ligase has a precursor which is a non-specific ligase.

    DB constructs a math model assuming that the first replicator comes from nowhere, which is fantastically unlikely, and his model says that this is fantastically unlikely. Big whoop. The math model is useless unless it reflects the physics.

    There's a famous math model which shows that kangaroos are energetically impossible; it neglects tendon elasticity, so it counts the energy cost for kangaroo hopping as the cost to lift and drop a kangaroo. Thud :)

    By Anonymous Stephen A Wells, at 6:19 PM  

  • stephen:

    Thanks; that's what I was trying to say; you managed to say it more clearly than I did. The point is that we're talking about some kind of pool of active chemicals reacting with one another and forming chains; the set of very long chains that form is probably not uniform; and the idea that there's one perfect "template" that's a perfect match for a replicator is just plain silly.

    But the idea of large numbers of similar chains forming, and the possibility of their being able to copy themselves using the chain fragments in the pool is quite a bit more likely; exactly how likely, we really can't compute, because we simply don't know enough.

    By Blogger MarkCC, at 6:46 PM  

  • It's worth noting that many protein sequences are functionally equivalent: they fold into the same shapes. It's only certain key locations and patterns on each protein that determine its shape, and hence most of its functional power. This vastly increases the number of viable versions of the same protein.

    In fact, this fact is one of the reasons that the cytochrome c sequence in all living things is such interesting evidence. There's no particular functional reason why cytochrome c sequences should be any which way: there are lots and lots of functionally equivalent forms. And, indeed, there are slightly different scattered throughout the tree of life. But instead of these differences being truly randomly distributed, they are grouped in a very particular pattern... which just happens to be the exact same pattern that morphological comparisons and the fossil/geological record both independently confirm...

    By Anonymous Anonymous, at 9:04 PM  

  • This comment is not related to the math contained in Berlinski's work. Please skip this if that is what you are looking for. It is a rant and I make no apologies for it.

    I am no great writer by any means but under no circumstances can understate how bad I find Berlinski's writing. I find it appaling that an individual with a phd in mathematics and a way of convoluting it to such a great degree.

    Once upon a time, I read "A Tour of the Calculus" and was so disgusted with the quality of writing and poor explanation of the notions found in calculus that I dropped a computer science class that I was taking based on the fact that the professor was a fan of his writing.

    The comment:
    Paris
    6 April, 2006
    "I have corrected a few trivial spelling errors in your original posting..."
    is such a catty way of starting a response that it is worth of a high school clique fight. I care that Dr. Berlinski is in Paris. I really do.

    For those of you that read this comment in spite of my warning, I'd like to point out that I do realize the irony in criticizing someone for making a personal attack in a critique. But I don't claim to be the bigger man. I am the smallest man. And I still think his books belong in that Ray Bradbury novel.

    By Anonymous Anonymous, at 1:57 AM  

  • No matter how many times I offer a clear and well-supported answers to certain criticisms of my essays, those very same criticisms tend to reappear in this discussion, strong and vigorous as an octopus.

    1 No one knows the minimum ribozyme length for demonstrable replicator activity. The figure of the 100 base pairs required for what Arrhenius calls “demonstrable ligase activity,” is known. No conceivable purpose is gained from blurring this distinction.

    Does it follow, given a sample space containing 1060 polynucleotides of 100 NT’s in length, that the odds in favor of finding any specific polynucleotide is one in 1060?
    Of course it does. It follows as simple mathematical fact, just as it follows as simple mathematical fact that the odds in favor of pulling any particular card from a deck of cards is one in fifty two.
    Is it possible that within a random ensemble of pre-biotic polynucleotides there may be more than one replicator?
    Of course it is possible. Whoever suggested the contrary?
    Is it possible that within a random ensemble of pre-biotic polynucleotides there may have been a significant subset capable of replication?
    It is possible, but not likely. If you will return to my essay – the one I wrote and not the one you imagine I wrote – I discussed just this point:

    “Solace from the tyranny of nucleotide combinatorials,” Arrhenius has remarked in discussing this very point, “is sought in the feeling that strict sequence specificity may not be required through all the domains of a functional oligmer, thus making a large number of library items eligible for participation in the construction of the ultimate functional entity.” Perhaps I will be allowed a translation into English. Why assume that the self-replicating sequences are apt to be rare just because they are apt to be long? They might have been quite common.
    They might well have been. Whom am I to say? And yet all experience is against it. Why should self-replicating RNA molecules have been so very common 3.6 billion years ago when they are presently impossible to discern under laboratory conditions? No one, for that matter, has ever seen a ribozyme capable of any form of catalytic action that was not remarkably specific in its sequence, and so unlike even closely related sequences. No one has ever seen a ribozyme capable of undertaking chemical action without a suite of enzymes in attendance. No one has every seen anything like it.”

    On n’est jamais aussi bien servi que par soi-meme.

    2 The thesis that before probabilities can be applied, dependence must be eliminated resembles the thesis that before a man may be engaged he must be married. It has things backwards. Independence is the crucial assumption of probability theory: it is what distinguishes probability from measure theory. And probabilistic models always begin with the assumption of independence. Such models are used precisely because we lack a full and complete picture of the facts. We must thus make some assumption about events, and the standard assumption, absent evidence to the contrary, is that they are independent. Imagine rejecting statistical mechanics on the grounds that, who knows?, some forces might be acting mysteriously to segregate randomly interacting molecules.

    3 The observation that “there are numerous kinds of interactions going on in a chemically active environment,” is, of course, trivially true, and for that reason uninteresting. Who could doubt it? Tell me which interactions you have in mind – tell me in straightforward chemical terms – and we shall have something to discuss.

    By Anonymous Anonymous, at 6:36 AM  

  • Dave:

    You're just repeating yourself. It's that's all you've got to say, stop wasting your time.

    Yes, I know that you found one particular citation that claims a minimum length of 100. But that's not the only number out there for a proposed minimum length replicator - it's just the one you like best.

    And further, it's *not* a uniform space of 10^60. Not every nucleotide chain is equally probable; in fact, not every nucleotide chain is even *possible*.

    But the biggest mistake is the claim that you can simply reduce the number of possible replicators to exactly one. Nonsense. You can wave your hands as much as you like, but the fact is, if you're working with numbers that are potentially tens of orders of magnitude off, you're math is bullshit. You can ignore the issue as much as you like - but there is every reason to believe that while replicators are far from common, that without a space as large as 10^60, they're a hell of a lot more common than *one*.

    WRT independence: I'm *not* saying that in general, you can't make assumptions of independence. What I'm saying is what *any* decent mathematician would say: to paraphrase my first semester probability book: "independence between two events is a valid assumption *if and only if* there is no known interaction between the events." That is the *definition* of independence. Chemical reactions co-occuring within an environment do *not* fit the assumption of independence unless you can demonstrate that they cannot affect each other. You can pretend that the fact that your proposed replicator/template pair are created completely independently - but the fact is, even if you demand this template nonsense, they're being generated as a part of a chemical process taking place between large numbers of prebiotic precursors being linked together to form chains. That is *not* an independent process.

    By Blogger MarkCC, at 8:17 AM  

  • Berlinski says: The figure of the 100 base pairs required for what Arrhenius calls “demonstrable ligase activity,” is known.

    That figure was disproved the same year (2002) Arrhenius wrote the above. Paul & Joyce designed a ligase ribozyme of only 61 bases ("A self-replicating ligase ribozyme," PNAS 99(20) 12733-12740). (Not that Arrhenius should have seen that coming.)

    Independence is the crucial assumption of probability theory: it is what distinguishes probability from measure theory. And probabilistic models always begin with the assumption of independence.

    That's absurd. All sorts of probabilistic models involve conditional probabilities. Independence is nice, but hardly a required assumption in probability.

    By Anonymous Anton Mates, at 11:07 PM  

  • Slightly Off Topic:

    You know, I've got a copy of A Tour of the Calculus and I've started it a couple of times and never really got into it, despite being very interested in the subject.

    I'm glad to see that I'm not the only one who has had that experience.

    Can anyone recommend a good book that tackles the same subject?

    By Blogger dAVE, at 3:54 PM  

  • Someone pointed out on Panda's Thumb that in my post above, criticizing Berlinski's borrowed minimum length estimate for a ligase, I failed to address his claim:

    "In this regard, I encourage you to consider Natasha Paul & Gerald Joyce’s ‘A self-replicating ligase ribozyme (Paul, N., Joyce, G., PNAS, Volume 99, No. 20, 2002). The demonstration reported, as Paul & Joyce make clear, was enzymatically driven; and so not properly speaking relevant to pre-biotic chemistry."

    (emphasis mine)

    So let me just take the time to say that's flatly false. The reaction was not enzymatically-driven, except by the very ribozyme they were demonstrating! The reaction occurred as follows:

    "Ligation assays were carried out in the presence of 25 mM MgCl2 and 50 mM EPPS (pH 8.5) at 23°C. Before initiating the reaction, B plus or minus T were preincubated in reaction buffer at 23°C for 10 min. The reactions were initiated by the addition of 5'-32P-labeled A, which had been preincubated at four times the final concentration in reaction buffer at 23°C for 10 min. "

    (A and B were the substrates that were combined into the ribozyme T.) No enzymes. So, no, Berlinski's estimate remains demonstrably wrong, as it has been since 2002. Lying about the Paul & Joyce experiment doesn't change that.

    By Anonymous Anton Mates, at 5:58 PM  

  • David Berlinski
    Paris

    The point is that we're talking about some kind of pool of active chemicals reacting with one another and forming chains ….

    What you are talking about is difficult to say. What molecular biologists are talking about is a) a random pool of beta D-nucleotides; and b) a random ensemble of polynucleotides. The polynucleotides form a random ensemble because chain polymerization is not sequence-specific.

    The set of very long chains that form is probably not uniform ….

    Sets are neither uniform nor non-uniform. It is probability distributions that are uniform. Given a) and b) above, one has a classical sampling with replacement model in the theory of probability, and thus a uniform and discrete probability measure.

    The idea that there's one perfect "template" that's a perfect match for a replicator is just plain silly ….

    Far from being silly, the idea is unavoidable, given the chemistry at hand. Partial replication? Certainly possible, but with well-known consequences for degradation by dilution and other cross-reactions, all endlessly discussed in the literature.

    But the idea of large numbers of similar chains forming, and the possibility of their being able to copy themselves using the chain fragments in the pool is quite a bit more likely; exactly how likely, we really can't compute, because we simply don't know enough ….

    If we do not know enough to carry out any reasonable computation, whence your confidence that replication is apt to be ‘quite a bit more likely?’

    In fact, this fact is one of the reasons that the cytochrome c sequence in all living things is such interesting evidence. There's no particular functional reason why cytochrome c sequences should be any which way: there are lots and lots of functionally equivalent forms. And, indeed, there are slightly different scattered throughout the tree of life. But instead of these differences being truly randomly distributed, they are grouped in a very particular pattern... which just happens to be the exact same pattern that morphological comparisons and the fossil/geological record both independently confirm...

    Cytochrome C is a functioning polypeptide, with a distinct history, and a known number of functional equivalents at specified amino acid sites. H. Yockey estimates the odds against discovering a single member of the family at 2 x 10 to the 44th power. It may well be that there are functionally equivalent families of RNA replicators, but no one knows whether this is so, a point I made explicitly in my essay.

    Yes, I know that you found one particular citation that claims a minimum length of 100. But that's not the only number out there for a proposed minimum length replicator - it's just the one you like best ….

    I have no interest in any particular number; if you wish to urge another, do so. You have given me no reason to suppose my assessment is incorrect; nor have you cited alternative references in the literature. Your comments are thus irrelevant.

    And further, it's *not* a uniform space of 10^60. Not every nucleotide chain is equally probable; in fact, not every nucleotide chain is even *possible* ….

    Probability measures are uniform or not. Spaces are otherwise. Your assertion that something is so because you believe that it is so is of no conceivable interest – to me or to anyone else.

    But the biggest mistake is the claim that you can simply reduce the number of possible replicators to exactly one. Nonsense ….

    I have never suggested the contrary. The issue is a figment of your imagination.

    You can wave your hands as much as you like, but the fact is, if you're working with numbers that are potentially tens of orders of magnitude off, you're math is bullshit. You can ignore the issue as much as you like - but there is every reason to believe that while replicators are far from common, that without a space as large as 10^60, they're a hell of a lot more common than *one*….

    They are, those replicators, far from common. I said as much in my essay, and I also gave reasons for supposing that this was so. With respect to your claim that “there is every reason to believe that such replicators are a hell of a lot more common than one --”

    What reasons might those be?
    And how would they change the conclusions that I reach in my essay?

    WRT independence: I'm *not* saying that in general, you can't make assumptions of independence. What I'm saying is what *any* decent mathematician would say: to paraphrase my first semester probability book: "independence between two events is a valid assumption *if and only if* there is no known interaction between the events." That is the *definition* of independence ….

    If you are disposed to offer advice about mathematics, use the language, and employ the discipline, common to mathematics itself. What you have offered is an informal remark, and not a definition. The correct definition is as follows: Two events A and B are independent if P(AB) = P(A)P(B). As a methodological stricture, the remark you have offered is, moreover, absurd inasmuch as some interaction between events can never be ruled out a priori, at least in the physical sciences.

    Chemical reactions co-occuring within an environment do *not* fit the assumption of independence unless you can demonstrate that they cannot affect each other ….

    An impossible demand, one that neither I nor anyone else could meet; it is thus irrelevant.

    You can pretend that the fact that your proposed replicator/template pair are created completely independently …

    Not my pretense but an assumption of the field in virtue of the sequence-independent character of polynucleotide polymerization.

    - but the fact is, even if you demand this template nonsense, they're being generated as a part of a chemical process taking place between large numbers of prebiotic precursors being linked together to form chains. That is *not* an independent process …

    No, it is not. Nor is it a process relevant to anything that I have claimed. I am talking of sequence specificity and the RNA world scenario. You are talking about chemical processes in whose existence you believe but whose identity you are unable to specify. This is as unhelpful as it is irrelevant. If you can describe, even in outline, a true autocatalytic sequence leading to self-replicating polynucleotides, you will at once advance to the forefront of current research. For the moment, the only true autocatalytic reaction in pre-biotic chemistry remains the formose reaction.

    That figure was disproved the same year (2002) Arrhenius wrote the above. Paul & Joyce designed a ligase ribozyme of only 61 bases ("A self-replicating ligase ribozyme," PNAS 99(20) 12733-12740). (Not that Arrhenius should have seen that coming.)

    So let me just take the time to say that's flatly false. The reaction was not enzymatically-driven, except by the very ribozyme they were demonstrating!

    a) A ribozyme by definition has catalytic powers: hence its name: a ribonucleic ENZYME.
    b) The R3C ligase ribozyme that Paul & Joyce re-designed is roughly 200 nucleotides (or 68, 600 daltons).
    c) The experiment reported in Paul & Joyce was not only enzyme driven-but RNA-template dependent.
    d) Limitations of this approach are discussed candidly in Paul & Joyce.
    e) There is a threshold of technical incompetence below which discussion is not profitable.

    A general remark: I include my address in these posts not for your convenience but for my own.

    By Anonymous David Berlinski, at 12:41 PM  

  • (By idon'treallycare)
    I might expand upon David Berlinski's very pertinent remarks...

    1. The ligase that Anton Mates mentioned consisted of a template and two complementary RNA substrates. A large part of the current discussion has centered on the idea that complementary strands of RNA are not necessary. Berlinski correctly identified in his Commentary paper:

    "The discovery of a single molecule with the power to initiate replication would hardly be sufficient to establish replication. What template would it replicate against? We need, in other words, at least two, causing the odds of their joint discovery to increase from 1 in 10^60 to 1 in 10^120"

    This is reiterated in Gerald Joyce's paper, who expresses similar concerns for the ncecessity of complementary strands in his experiment:

    "Several nonenzymatic template-dependent ligation systems have been devised to study the role of a template in binding and positioning complementary substrates for covalent bond formation (1-5). These have included simple self-replicating systems of the form A + B right-arrow T, where A and B are substrates that bind to a complementary template, T, and become joined to form a product molecule that is identical to the template (6-9)....The nucleic acid-based systems are the most straightforward and rely on simple Watson-Crick pairing interactions between a short oligonucleotide template and two complementary oligonucleotide substrates"

    2. I would like to point out to Anton Mates that the general point of Berlinski's analysis is that "[i]f the odds in favor of SELF REPLICATION (not ligation) are 1 in 10^60, no betting man would take them, no matter how attractive the payoff, and neither presumably would nature."

    In considering this the relevancy of Joyce's experiment lapses in light of his own remarks:

    "However, all of the laboratory evolution systems that have been described do not involve self-replication; replication is instead carried out by polymerase proteins that are not part of the evolving system."

    3. Joyce and Paul's own protocol reveals the limitations of their model. This Berlinski has addressed.

    4. Even if one were to trivially use this prebiotically irrelevant example as a model for minimal ligation legnth, Berlinski is justified in using 100 nt because it is a much better model of what a self-replicating RNA would require, as Joyce and Paul reveal:

    "A second approach to a self-sustained evolving system might involve the in vitro evolution of a nucleic acid (or protein) enzyme that catalyzes the replication of many different nucleic acid (or protein) molecules, including copies of itself. The evolution process that would be used to obtain such an enzyme would not be self-sustained, but the product might be able to evolve in a self-sustained manner. Substantial progress has been made along these lines, although the goal of a self-replicating, evolving enzyme has not been achieved. Starting from a ribozyme that catalyzes the template-directed joining of two RNA molecules (40), an RNA polymerase ribozyme was evolved that catalyzes the polymerization of up to 14 NTPs on an external RNA template, operating with high fidelity and generality for almost any template sequence (22). The ribozyme itself contains approximately 200 nt, so it is not nearly capable of catalyzing the synthesis of additional copies of itself. Perhaps a smaller and/or more active form of the RNA polymerase ribozyme might be developed."

    The significant progress toward developing a ligating ribozyme prebiotically relevant to an RNA polymerase is actually 200nt in length and cannot copy itself completely.

    Just some points to consider.

    By Anonymous Anonymous, at 1:33 PM  

  • This is rapidly degenerating into an "is not", "is too" shouting match; unless something new comes up, this will be my last post on the topic.


    The point is that we're talking about some kind of pool of active chemicals reacting with one another and forming chains ….

    What you are talking about is difficult to say. What molecular biologists are talking about is a) a random pool of beta D-nucleotides; and b) a random ensemble of polynucleotides. The polynucleotides form a random ensemble because chain polymerization is not sequence-specific.

    The set of very long chains that form is probably not uniform ….

    Sets are neither uniform nor non-uniform. It is probability distributions that are uniform. Given a) and b) above, one has a classical sampling with replacement model in the theory of probability, and thus a uniform and discrete probability measure.


    When, in the course of a debate, a participant resorts to grammatical criticism, that's generally an indication that they have nothing to contribute to the substance of the argument.

    It is absolutely not the fact that all chains are equally likely to be formed. The geometry of the elements of the chains makes that impossible. As I suspect you well know, which is why you're playing games picking on grammar rather than addressing actual content.


    The idea that there's one perfect "template" that's a perfect match for a replicator is just plain silly ….

    Far from being silly, the idea is unavoidable, given the chemistry at hand. Partial replication? Certainly possible, but with well-known consequences for degradation by dilution and other cross-reactions, all endlessly discussed in the literature.


    No, not unavoidable. How do chains form? How does a replicator work? Look at the research cited by other posters here; a primitive replicator does not work by having perfectly matched templates to stick to; it works by assembling bits into a matching chain.


    But the idea of large numbers of similar chains forming, and the possibility of their being able to copy themselves using the chain fragments in the pool is quite a bit more likely; exactly how likely, we really can't compute, because we simply don't know enough ….

    If we do not know enough to carry out any reasonable computation, whence your confidence that replication is apt to be ‘quite a bit more likely?’


    Again, deliberately avoiding the real point. The point that I've been making all along is that the math of your probability calculation is sloppy gibberish based on invalid assumptions.

    Why do I assume that replication is probably quite a bit more likely? Because I believe that saying that there is exactly one replicator is foolish. Among other reasons: we know that there's tons of ways that you can permute sequences in living things without destroying their functions. As you desperately try to spin around in your next comment.


    In fact, this fact is one of the reasons that the cytochrome c sequence in all living things is such interesting evidence. There's no particular functional reason why cytochrome c sequences should be any which way: there are lots and lots of functionally equivalent forms. And, indeed, there are slightly different scattered throughout the tree of life. But instead of these differences being truly randomly distributed, they are grouped in a very particular pattern... which just happens to be the exact same pattern that morphological comparisons and the fossil/geological record both independently confirm...

    Cytochrome C is a functioning polypeptide, with a distinct history, and a known number of functional equivalents at specified amino acid sites. H. Yockey estimates the odds against discovering a single member of the family at 2 x 10 to the 44th power. It may well be that there are functionally equivalent families of RNA replicators, but no one knows whether this is so, a point I made explicitly in my essay.


    Yes, here's an example of how a particular chain can vary in numerous ways. You claim the odds of a single particular chain being in the order of 1x10^60; here's an example where the odds are 2x10^44. That leaves 10^15th possibilities!


    Yes, I know that you found one particular citation that claims a minimum length of 100. But that's not the only number out there for a proposed minimum length replicator - it's just the one you like best ….

    I have no interest in any particular number; if you wish to urge another, do so. You have given me no reason to suppose my assessment is incorrect; nor have you cited alternative references in the literature. Your comments are thus irrelevant.


    Dave, you *do* have an interest in a particular number: when someone provided a citation for 61 bases, you went to rather a lot of trouble to try to argue that a 61 base replicator was invalid.


    And further, it's *not* a uniform space of 10^60. Not every nucleotide chain is equally probable; in fact, not every nucleotide chain is even *possible* ….

    Probability measures are uniform or not. Spaces are otherwise. Your assertion that something is so because you believe that it is so is of no conceivable interest – to me or to anyone else.


    Again, you go for the grammar rather than the facts. Not every chain is equally probable; in fact, not every chain is even possible. To be correct, your probability numbers have to accurately describe the number of possibilities. If you don't eliminate the impossible chains from your list of possibilities, then you're artificially inflating your probability numbers - and that's otherwise known as "faking the data".



    But the biggest mistake is the claim that you can simply reduce the number of possible replicators to exactly one. Nonsense ….

    I have never suggested the contrary. The issue is a figment of your imagination.


    Ah. So that would be why you continue to insist that the your probability number must be *1* in 10^60?


    You can wave your hands as much as you like, but the fact is, if you're working with numbers that are potentially tens of orders of magnitude off, you're math is bullshit. You can ignore the issue as much as you like - but there is every reason to believe that while replicators are far from common, that without a space as large as 10^60, they're a hell of a lot more common than *one*….

    They are, those replicators, far from common. I said as much in my essay, and I also gave reasons for supposing that this was so. With respect to your claim that “there is every reason to believe that such replicators are a hell of a lot more common than one --”


    Your argument is only that they are rare. Within a space of 10^60, 10^20 different replicators would be considered quite rare by any reasonable person. But that's a rather different number than 1, don't you agree?



    What reasons might those be?
    And how would they change the conclusions that I reach in my essay?

    WRT independence: I'm *not* saying that in general, you can't make assumptions of independence. What I'm saying is what *any* decent mathematician would say: to paraphrase my first semester probability book: "independence between two events is a valid assumption *if and only if* there is no known interaction between the events." That is the *definition* of independence ….

    If you are disposed to offer advice about mathematics, use the language, and employ the discipline, common to mathematics itself. What you have offered is an informal remark, and not a definition. The correct definition is as follows: Two events A and B are independent if P(AB) = P(A)P(B). As a methodological stricture, the remark you have offered is, moreover, absurd inasmuch as some interaction between events can never be ruled out a priori, at least in the physical sciences.


    And again, you go for grammar rather than content. I think I'm noticing a pattern here.

    The game that you're playing right here is typical: let's try to use over-complicated language and irrelevant definitions in order to obscure the point.

    Let's take an actual mathematical look at what you said above: We can assume that the probability of two events, A and B are independent and can be computed using P(AB) = P(A)*P(B) if and only if the probability P(AB) = P(A)*P(B). Not a very useful definition, eh?

    The fact remains: when computing probability you don't get to just assume independence by default. It's something that must be demonstrated. You can wiggle all you want, but it's not valid probability to assume independence between chemical reactions occuring in a common environment unless you have some argument for why they can't influence each other.


    Chemical reactions co-occuring within an environment do *not* fit the assumption of independence unless you can demonstrate that they cannot affect each other ….

    An impossible demand, one that neither I nor anyone else could meet; it is thus irrelevant.


    No, not irrelevant Dave. As any real mathematician would be glad to tell you, you can't do meaningful computations on the basis of meaningless numbers. If you can't show why the events are independent, and you can't predict what, if any, impact the events could have on each other, then you cannot produce a meaningful probability estimate. Period.


    You can pretend that the fact that your proposed replicator/template pair are created completely independently …

    Not my pretense but an assumption of the field in virtue of the sequence-independent character of polynucleotide polymerization.


    Except that it's not sequence-independent. Because of the geometry of the basic elements and the ways they connect, some sequences are impossible; some sequences are unlikely; and some are relatively likely.


    - but the fact is, even if you demand this template nonsense, they're being generated as a part of a chemical process taking place between large numbers of prebiotic precursors being linked together to form chains. That is *not* an independent process …

    No, it is not. Nor is it a process relevant to anything that I have claimed. I am talking of sequence specificity and the RNA world scenario. You are talking about chemical processes in whose existence you believe but whose identity you are unable to specify. This is as unhelpful as it is irrelevant. If you can describe, even in outline, a true autocatalytic sequence leading to self-replicating polynucleotides, you will at once advance to the forefront of current research. For the moment, the only true autocatalytic reaction in pre-biotic chemistry remains the formose reaction.

    That figure was disproved the same year (2002) Arrhenius wrote the above. Paul & Joyce designed a ligase ribozyme of only 61 bases ("A self-replicating ligase ribozyme," PNAS 99(20) 12733-12740). (Not that Arrhenius should have seen that coming.)

    So let me just take the time to say that's flatly false. The reaction was not enzymatically-driven, except by the very ribozyme they were demonstrating!

    a) A ribozyme by definition has catalytic powers: hence its name: a ribonucleic ENZYME.
    b) The R3C ligase ribozyme that Paul & Joyce re-designed is roughly 200 nucleotides (or 68, 600 daltons).
    c) The experiment reported in Paul & Joyce was not only enzyme driven-but RNA-template dependent.
    d) Limitations of this approach are discussed candidly in Paul & Joyce.
    e) There is a threshold of technical incompetence below which discussion is not profitable.

    A general remark: I include my address in these posts not for your convenience but for my own.


    And you close by insulting the person who pointed out the fact that your "100 base" assumption is no good, along with yet another deliberately obfuscative argument. You demand that the replicator be RNA-based; then you criticize a piece of work because it uses RNA, and RNA is an enzyme.

    By Blogger MarkCC, at 2:23 PM  

  • Mark, please let me know if this sub-conversation runs beyond your interest. Your blog's on bad math, not bad molecular biology. You shouldn't have to suffer simply because Dr. Berlinski is talented at both.

    With that said, Dr. Berlinski continues to persist in his erroneous claim that "the figure of the 100 base pairs required for what Arrhenius calls “demonstrable ligase activity,” is known; it is current in the literature; and it is, all evidence suggests, an under-estimate." This even though it's been pointed out to him that a ligase ribozyme of 61 base pairs (actually, 61 single bases, as the template strand itself performs the ligase function) was engineered and reported on in 2002. Let me mention that the paper in question, "A self-replicating ligase ribozyme" (Paul, N., Joyce, G., PNAS, Volume 99, No. 20, 2002), is available for free public download.


    Berlinski claimed that this ribozyme didn't count as disproof of his claim because "The demonstration reported, as Paul & Joyce make clear, was enzymatically driven; and so not properly speaking relevant to pre-biotic chemistry." I replied that the only enzyme involved was the ribozyme they were demonstrating. Now Berlinski replies:

    a) A ribozyme by definition has catalytic powers: hence its name: a ribonucleic ENZYME.

    Now this is interesting. As anyone here who's taken intro genetics is aware, a ligase also is by definition a type of enzyme. Dr. Berlinski appears to be saying that RNA ligases can't be less than 100 bp long...and enzymes, such as RNA ligases, don't count as counter-examples. What more can be said?

    b) The R3C ligase ribozyme that Paul & Joyce re-designed is roughly 200 nucleotides (or 68, 600 daltons).

    That is, the original ligase, prior to their redesign, was that long. The ribozyme which resulted from their redesign, on the other hand, is 61 nucleotides. Unfortunately for Dr. Berlinski, it's the latter which was the subject of their paper.

    c) The experiment reported in Paul & Joyce was not only enzyme driven-but RNA-template dependent.

    Only in the sense that the ligase ribozyme was the template (or rather, one strand of it was). Perhaps Berlinski means to say "substrate-dependent," which is certainly true but irrelevant tof his original claim.

    d) Limitations of this approach are discussed candidly in Paul & Joyce.

    This, at least, is true. Again, irrelevant to his claim, but true.

    e) There is a threshold of technical incompetence below which discussion is not profitable.


    If only there were a similar threshold for the profitability of publishing essays...

    By Anonymous Anton Mates, at 2:24 PM  

  • Paris
    David Berlinski

    I am quite sure that I have outstayed my welcome. I'm more than happy to let you have the last words. Thank you for allowing me to post my own comments.

    DB

    By Anonymous David Berlinski, at 4:13 PM  

  • For: Dave

    I'd recommend "Calculus made easy". I thought that was a very nice intro to cal. It's got more actually math in it than Berlinski's book but if you're interested in the topic then that's a good thing.

    Also, you'll be saved the long-winded "explanation" of continuity.

    By Anonymous Anonymous, at 1:39 AM  

Post a Comment

Links to this post:

Create a Link

<< Home