Good Math/Bad Math: Q&A roundup: No Free Lunch

Friday, April 14, 2006

Q&A roundup: No Free Lunch

I've been receiving a bunch of email over the last couple of weeks with similar questions about my critique of Dembski's "no free lunch". Rather than answering the emails individually, I've been gathering them up, and I'll try to answer the questions here.

What does "averaged over all landscapes" mean?

In a NFL setting, we're talking about doing something like a search over a landscape, where we have "fitness functions" which provide the search algorithm with its directions: the search moves in the direction that the fitness function tells it to.

One of the properties of the fitness functions is that they don't get to see the real landscape: their job is to try to approximate the landscape in a good way, so the search will wind up at a good destination.

When we average over all landscapes - what that means is that instead of considering how a particular fitness function matches a particular landscape, we're saying "Let's look at every possible landscape, and run the fitness function, and see what kind of results it gets".

NFL says that if we do that, that no matter how carefully we design the fitness function, it can't possibly do better than random for all landscapes. What this means is that if we don't know which landscape we're going to be confronted with, we can't pick a fitness function that's guaranteed to do well. This is interesting, for example, when applied to things like financial derivatives markets: if we don't have information about what the markets are going to do, then on average, we can't beat the market.
NFL works for operations research tasks; why doesn't it work for evolution?

Receiving multiple emails with this question surprised the heck out of me. It's an odd question, one which is both deep and shallow at the same time. I suspect that someone somewhere put folks up to asking this one as a response to my argument, but I have no proof of that. Anyway - on to the answer:

One of the key properties of NFL is that it uses blind fitness functions. That is, the fitness function doesn't get to change itself depending on what landscape you're running it on. The fitness function in NFL is also deterministic: for any point in the landscape, looking at where it can go next, it can only choose one path as the best.

Evolution is an adaptational process: modelled as a function, it's more like the learning functions in Case's computational learning theory than like the fitness functions in NFL; an evolutionary process doesn't have a fixed path built in to it; it doesn't even have a real fitness function built in to it. In effect, an evolutionary process is modifying its fitness function as it goes. The landscape that it traverses gets built into the function, so that the longer it runs, the more adapted to the landscape it gets.

Evolution is also not deterministic: it tries multiple paths. Remember that evolution is working on a species, not on individuals. Within a species, multiple adaptations can occur in different sub-populations. That is effectively trying multiple paths. In fact, that's exactly how speciation occurs: different subpopulations adapt to the environment in different ways.
NFL is a probability argument; why do you keep babbling about landscapes?

There are a lot of different ways that you can describe NFL-type problems; landscapes is the one I think is intuitively easiest. The idea of things like NFL is to look at a state-space with transitions, and see how algorithms can perform in different state spaces. You can describe that as a probability space, a landscape, a weighted graph, a state table, or any number of other things. The "probability" piece comes from comparing the fitness function (or fitness landscape, or whatever abstraction you choose) with the landscape it's supposed to by trying to fit. By doing that comparison, you can find the probability of a given type of fitness function performing well on a given type of landscape.
NFL doesn't say evolution can't do better than randomness; it says that evolution must have some knowledge of the landscape to do better than randomness. Where does evolution get that knowledge?

As I said above, NFL relies on deterministic, fixed fitness functions for searching landscapes. Evolution is effectively non-deterministic, because it tries multiple paths; the successful paths persist and keep going; the unsuccessful ones die out.

Non-determinism is a big deal. The ability to try multiple paths and keep the ones that work is an incredibly powerful notion - and that's exactly where evolution gets its ability to adapt and perform well in any fitness landscape. Mathematically, you can't model evolution as a fixed deterministic function; it's an adaptive non-deterministic one. And an adaptive non-deterministic algorithm will not fall into the NFL trap. (See the Case citation above for some great work about what non-deterministic adaptive functions can do.)

7 Comments:

The argument that the "No Free Lunch" rules out evolution is invalid, of course, but not for the reason you give. There are NFL theorems that apply to stochastic learners as well. ("nondeterministic" has a specific meaning in Computer Science, which does not apply here)

The to see the real falacy, you need to realize that a typical member of the set of all possible landscapes is completely random. It contains no patterns at all.

It is true that evolution that is impossible in such a lanscape, but so is any kind of learning or thought. It would not support life even if it were created.

The universe we inhabit appears to be rather predictable. This is true only of a vanishingly small fraction of the set of all landscapes.

Not only that, but all but a vanishingly small fraction of the "possible" landscapes are not really possible at all, in that no mechanism fitting inside our universe can produce such a landscape. (Most landscapes have Kolmogorov complexity greater than the number of atoms in the universe).

The NFL theorems don't really say anything relavant to evolution at all.

By Anonymous, at 4:17 PM
Ralph:

I'm using non-determinism in the comp sci sense; from any give point in the parameter space, the "evolution function" can return more than one value, unpredictably. It's nondeterministic in the same sense as the "N" in "NP".

WRT the rest of what you said: I've not seen the work describing the K-C complexity of landscapes, so I can't comment. I'd appreciate a cite, just because I'm interested.

By MarkCC, at 5:10 PM
I'm using non-determinism in the comp sci sense; from any give point in the parameter space, the "evolution function" can return more than one value, unpredictably.

Sigh!

That isn't what non-determinism in the comp sci sense is. NP means that if someone gives you a solution, you can verify it in polynomial time. It has nothing to do with anything acting unpredictably.

There is a definition that invloves a function returning all possible values, but evolution can not, does not, and need not, work that way. The landscapes involved are far too large (typically they have more points than there are atoms in the universe).

Tgibbs is correct, evolution really does need the landscape to have structure of a particular kind. Fortunately, many experiments show that it does have such structure.

For example, the fact that it is possible to breed new varieties of crops, without knowing a thing about genetics. Or, perhaps more to the point, that bacteria can and do evolve antibiotic resistance, with no designer in sight.

Which brings us to the true Achilles heel of the whole ID thing. A designer of the human body would hardly be deserving of praise. I'd think product liabiliy lawsuits would be more appropriate. My bad back alone ought to be worth millions!

By Anonymous, at 10:01 PM
That isn't what non-determinism in the comp sci sense is. NP means that if someone gives you a solution, you can verify it in polynomial time. It has nothing to do with anything acting unpredictably.

Ahh ... that turns out not to be the case. Markcc's definition is exactly what the N in NP stands for. A problem can be solved (i.e. the correct answer can be found) in polynomial time, when many computational paths can be taken at once. If you could 'predict" the computational path that needed to be taken with that same complexity, then you would have a deterministic algorithm.

By Anonymous, at 12:55 AM
markk jumped in and said exactly what I was going to say about the "N" in "NP". NP is nondeterministic polynomial time: if you have a machine that can simultaneously explore every answer, then you can solve the problem in P time.

I still think that one of the best ways to describe a search function that simultaneously explores multiple paths is non-determinism. In a functional framework - that is, a framework where search direction at each step is determined by a function - the correct way to describe the search function is non-determinism.

By MarkCC, at 10:09 AM
This is interesting, for example, when applied to things like financial derivatives markets: if we don't have information about what the markets are going to do, then on average, we can't beat the market.

I don't think this is necesarily correct. If the derivatives market has enough rules or regularity, then this market may not be random and hence a general fitness function might work. That is to say, for the NFL theorems to apply we would have to show hardness of the derivatives market *as*it*is* played today.

To use a better example, suppose we have some NP complete problem on a graph, so we quickly state that the NFL theorem applies. However, unbeknownst to us it turns out that our hypothetical problem is not really over an arbitrary graph but on a well defined subset which can be solved, say, by a greedy algorithm. We have not yet identified this fact, but an evolutionary algorithm might actually easily find the answer in every case using almost any fitness function.

By Anonymous, at 7:06 PM
Years ago, i took an Operating Systems course from Peter Denning. You may recall that Prof. Denning proved that you couldn't do better than Least Recently Used as an algorithm for page replacement in a demand paging system. There are two problems with the proof. The first is that it assumes that computer programs behave unpredictably. That is, you can't guess what page a program might need next. This is patently false. Many programs have sequential memory patterns, which are easily detectable. So, despite the proof that you can't do better, there are operating systems that routinely do better - and quite a bit better.

The second false assumption is that demand paging is a good idea. Despite proding from Seymore Cray, the industry hasn't caught on. I sometimes run Linux without a swap device. This doesn't turn off paging completely, though, since it knows that the original disk file for a program has a copy of the unchanged pages. Linux can use that to continuously "swap in" things. Still, if you have enough RAM, it won't kick out a program to make room for disk cache, i think.

By Stephen, at 6:43 PM

Good Math/Bad Math

Friday, April 14, 2006

Q&A roundup: No Free Lunch

7 Comments:

About

About Me

Previous