## Thursday, May 18, 2006

### Why oh why Y?

So in the last few posts, I've been building up the bits and pieces that turn lambda calculus into a useful system. We've got numbers, booleans, and choice operators. The only thing we're lacking is some kind of repetition.

This is, alas, going to be a bit difficult. Lambda calculus does repetition using recursion (an explanation of recursion can be found here). But since functions in lambda calculus don't have names, that means that we resort to something tricky. It's called the Y combinator, aka the lambda fixed point operator.

Let's start by looking at a simple recursive function outside of the lambda calculus. The factorial function, n!, is the standard example:
` factorial(n) = 1 if n = 0 factorial(n) = n*factorial(n-1) if n > 0`

If we want to start trying to write that in lambda calculus, we'd need a couple of tools... We need a test for equality to zero, and we need a way of multiplying numbers; and we need a way of subtracting one.

For testing equality to zero, we'll use a function named `IsZero`, which takes three parameters: a number, and two values. If the number is 0, it returns the first value; if it's not 0, then it returns the second value.

For multiplication - we can't write multiplication until we work out recursion. But we'll just handwave that for now, and have a function `Mult x y`.

And finally, for subtracting one, we'll use `Pred x` for the predecessor of x - that is, x - 1.

So - a first stab at factorial, written with the recursive call left with a blank in it, would be:
` lambda n . IsZero n 1 (Mult n (something (Pred n)))`
Now, the question is, how do we get the "something" in there in a way that causes it to recurse?

The answer is something called a combinator. A combinator is a special kind of higher order function which can be defined without reference to anything but function applications. (A higher order function is a function which takes functions as parameters and returns functions as results). The Y combinator is the special, almost magical function that makes recursion possible. Here's what it looks like:
`let Y = lambda y . (lambda x . y (x x)) (lambda x . y (x x))`
If you look at it, the reason for calling it Y is because it's "shaped" like a Y. To show you that more clearly, sometimes we write lambda calculus using trees. Here's the tree for the Y combinator:

What makes the Y combinator special is that applying Y to itself creates itself - that it, `(Y Y) = Y (Y Y)`. Let's take (Y Y) and work it through:
1. `Y Y`
2. Expand the first Y:
`(lambda y . (lambda x . y (x x)) (lambda x . y (x x))) Y`
3. Now, beta:
`(lambda x . Y (x x)) (lambda x. Y (x x))`
4. Alpha[x/z] the second lambda:
`(lambda x . Y (x x)) (lambda z. Y (z z))`
5. Beta:
`Y ((lambda z. Y (z z)) (lambda z. Y (z z)))`
6. Expand that Y in front and alpha[y/a][x/b]:
`(lambda a . (lambda b . a (b b)) (lambda b . a (b b))) ((lambda z. Y (z z)) (lambda z. Y (z z)))`
7. Beta:
`(lambda b . ((lambda z. Y (z z)) (lambda z. Y (z z))) (b b)) (lambda b . ((lambda z. Y (z z)) (lambda z. Y (z z))) (b b))`
Now, look carefully at that expression. That's `(Y (Y Y))`. [Remember from the above that (Y Y) = (lambda x . Y (x x)) (lambda x . Y (x x))] So `Y Y = Y (Y Y)` That's the magic of Y: it reproduces itself. `(Y Y) = Y (Y Y) = Y (Y (Y Y))`, forever.

So how do we use this crazy thing?

Well, let's take our first attempt up there, and try to play with it a bit. Let's give it a name, and try writing it with that name:
`let fact = lambda n . IsZero n 1 (Mult n (fact (Pred n)))`
Now - the trick is, "fact" is not an identifier defined inside of "fact". How do we let "fact" reference "fact"? Well, we can do a lambda abstraction to let us pass the "fact" function as a parameter; then if we can find a way to write "fact" so that we can pass it to itself as a parameter, we'll be ready to go. Let's call that metafact.
`let metafact = lambda fact . (lambda n . IsZero n 1 (Mult n (fact (Pred n))))`
Now, if we could apply metafact to itself, we'd have our factorial function. That is, `fact n = (metafact metafact) n`.

That's where Y comes in. It lets us create a bizzare structure where that function, above, will be copied every time we need to recurse. `metafact (Y metafact)` will do what we want. Expanded out, that's:
`(lambda fact . (lambda n . IsZero n 1 (Mult n (fact (Pred n))))) (Y (lambda fact . (lambda n . IsZero n 1 (Mult n (fact (Pred n))))))`
`(Y metafact)` is the parameter value of fact in the lambda; when we do beta on the function, if n is zero, then it just returns 1. If it's not zero, then we get the call to `fact (Pred n)`. Fact betas to `Y metafact`. Which does that insane magic copying thing, giving us `metafact (Y metafact) (Pred n)`.

Voila, recursion. Deeply twisted recursion.

I learned about the Y combinator back in my undergrad days, which would place it around 1989 - and I still find it rather mystifying. I do understand it now, but I can't imagine how on earth anyone ever figured it out!

If you're interested in this, then I highly recommend getting a copy of the book The Little Schemer. It's a wonderful little book - set up like a childrens book, where the front of each page is a question; and the back of each page is the answer. It's written in a delightfully playful style, it's very fun and engaging, and it will not only teach you to program in Scheme.

As an important side-note there are actually a couple of different versions of the Y combinator. There are different ways of evaluating lambda calculus: given an expression like:
`(lambda x y . x * y) 3 ((lambda z. z * z) 4)`
we can do it in two different orders: we can first do the beta on "`(lambda x y . x * y)`,which would give us:`3 * ((lambda z . z * z) 4)`
Or, we could beta `((lambda z . z * z) 4)` first:`(lambda x y . x * y) 3 (4 * 4)`
In this case, the two orders end up with the same result; but that's not always the case.
The first order is what we call lazy evaluation: we don't evaluate the parameters to a function until we need to use them. The second is called eager evaluation : we always evaluate parameters before the functions that they're passed to. (In real programming languages, Lisp, Scheme, and ML are lambda-calculus based languages that use eager evaluation; Haskell and Miranda are lambda calculus based languages that use lazy evaluation.) The Y combinator I described above is the Y for lazy evaluation. If we used eager evaluation, then Y combinator above wouldn't work - in fact, it would copy Ys forever.

• The big deal about Y is not that (Y Y) = Y, but that for any M, (Y M) = (M (Y M)).

So if we define fact = (Y metfact), then we know that

fact
= (Y metafact)
= (metafact (Y metafact))
= (metafact fact)

There's lots more to be said about Y (of course!). Among other things, there's a simple derivation that takes a lot of mystery away. I can post that if folks are interested.

By  Mitch Wand, at 10:41 AM