My Favorite Calculus: Lambda (part 1)
(In the original version of this post, I tried using a javascript tool for generating MathML. It appears to not work very well; several browsers failed to render it correctly, and it doesn't work in an RSS feed. I've gone back and re-written everything in simple text format.)
In computer science, especially in the field of programming languages, we tend to use one particular calculus a lot: the Lambda calculus. Lambda calculus is also extensively used by logicians studying the nature of computation and the structure of discrete mathematics. Lambda calculus is great for a lot of reasons, among them:
The lambda calculus is based on the concept of functions. In the pure lambda calculus, everything is a function; there are no values at all except for functions. But we can build up anything we need using functions. Remember back in the early days of this blog, I talked a bit about how to build mathematics? We can build the entire structure of mathematics from nothing but lambda calculus.
So, enough lead-in. Let's dive in a look at LC. Remember that for a calculus, you need to define two things: the syntax, which describes how valid expressions can be written in the calculus; and a set of rules that allow you to symbolically manipulate the expressions.
It turns out to be no problem, because of the fact that functions are values. So instead of writing a two parameter function, you can write a one parameter function that returns a one parameter function - in the end, it's effectively the same thing. It's called currying, after the great logician Haskell Curry.
For example, suppose we wanted to write a function to add x and y. We'd like to write something like:
Now that we know that adding multiple parameter functions doesn't really add anything but a bit of simplified syntax, we'll go ahead and use them when it's convenient.
A lambda calculus expression is completely valid only when all of its variables are bound. But when we look at smaller subexpressions of a complex expression, taken out of context, they can have free variables - and making sure that the variables that are free in subexpressions are treated right is very important.
So - for instance, if we had an expression like:
We can do alpha changing X to Y (written "alpha[x/y]" and get):
Doing alpha does not change the meaning of the expression in any way. But as we'll see later, it's important because it gives us a way of doing things like recursion.
Beta basically says that if you have a function application, you can replace it with a copy of the body of the function with references to the parameter identifiers replaced by references to the parameter value in the application. That sounds confusing, but it's actually pretty easy when you see it in action.
Suppose we have the application expression: "
A slightly more complicated example is the expression:
It's an interesting expression, because it's a lambda expression that when applied, results in another lambda expression: that is, it's a function that creates functions. When we do beta reduction in this, we're replacing all references to the parameter "y" with the identifier "q"; so, the result is "
One more example, just for the sake of being annoying:
"
Written formally, beta says:
That condition on the end, "if free(e) subset free(B[x := e]" is why we need alpha: we can only do beta reduction if doing it doesn't create any collisions between bound identifiers and free identifiers: if the identifier "z" is free in "e", then we need to be sure that the beta-reduction doesn't make "z" become bound. If there is a name collision between a variable that is bound in "B" and a variable that is free in "e", then we need to use alpha to change the identifier names so that they're different.
As usual, an example will make that clearer: Suppose we have a expression defining a function, "
In the parameter "(x + 2)", x is free. Now, suppose we break the rule and go ahead and do beta. We'd get:
The variable that was free in "x + 2" is now bound. Now suppose we apply that function:
By beta, we'd get "3 + 3 + 2".
What if we did alpha the way we were supposed to?
by alpha[x/y]:
by beta:
by beta again:
"3+x+2" and "3+3+2" are very different results!
And that's pretty much it. There's another rule you can optionally add called Eta-reduction, but we'll skip that for now. What I've described here is Turing complete - a full effective computation system. To make it useful, and see how this can be used to do real stuff, we need to define a bunch of basic functions that allow us to do math, condition tests, recursion, etc. I'll talk about those in my next post.
We also haven't defined a model for lambda-calculus yet. (I discussed models here and here.) That's actually quite an important thing! LC was played with by logicians for several years before they were able to come up with a complete model for it, and it was a matter of great concern that although LC looked correct, the early attempts to define a model for it were failures. After all, remember that if there isn't a valid model, that means that the results of the system are meaningless!
In computer science, especially in the field of programming languages, we tend to use one particular calculus a lot: the Lambda calculus. Lambda calculus is also extensively used by logicians studying the nature of computation and the structure of discrete mathematics. Lambda calculus is great for a lot of reasons, among them:
- It's very simple.
- It's Turing complete.
- It's easy to read and write.
- It's semantics are strong enough that we can do reasoning from it.
- It's got a good solid model.
- It's easy to create variants to explore the properties of various alternative ways of structuring computations or semantics.
The lambda calculus is based on the concept of functions. In the pure lambda calculus, everything is a function; there are no values at all except for functions. But we can build up anything we need using functions. Remember back in the early days of this blog, I talked a bit about how to build mathematics? We can build the entire structure of mathematics from nothing but lambda calculus.
So, enough lead-in. Let's dive in a look at LC. Remember that for a calculus, you need to define two things: the syntax, which describes how valid expressions can be written in the calculus; and a set of rules that allow you to symbolically manipulate the expressions.
Lambda Calculus Syntax
The lambda calculus has exactly three kinds of expressions:- (1) Function definition: a function in lambda calculus is an expression, written: which means "a function with one parameter named X, which returns the result of evaluating the body". We say that the lambda expression binds the parameter.
lambda x . body
- (2) Identifier reference: an identifier reference is a name which matches the name of a parameter defined in a function expression enclosing the reference.
- (3) Function application: applying a function is written by putting the function value in front of its parameter, as in "(lambda x . plus x x) y".
Currying
There's a trick that we play in lambda calculus: if you look at the definition above, you'll notice that a function (lambda expression) only takes one parameter. That seems like a very big constraint - how can you even implement addition with only one parameter?It turns out to be no problem, because of the fact that functions are values. So instead of writing a two parameter function, you can write a one parameter function that returns a one parameter function - in the end, it's effectively the same thing. It's called currying, after the great logician Haskell Curry.
For example, suppose we wanted to write a function to add x and y. We'd like to write something like:
lambda x y . plus x y
. The way we do that with one-parameter functions is: we write one function with one parameter, which returns another function in one parameter. Adding x plus y becomes writing a one-parameter function with parameter x, which returns another one parameter function which adds x to its parameter: lambda x.(lambda y. plus x y)
Now that we know that adding multiple parameter functions doesn't really add anything but a bit of simplified syntax, we'll go ahead and use them when it's convenient.
Free vs Bound Identifiers
One important syntactic issue that I haven't mentioned yet is closure or complete binding. For a lambda calculus expression to be evaluated, it cannot reference any identifiers that are not bound. An identifier is bound if it a parameter in an enclosing lambda expression; if an identifier is not bound in any enclosing context, then it is called a free variable.-
lambda x . plus x y
: in this expression, "y" and "plus" are free, because they're not the parameter of any enclosing lambda expression; x is bound because it's a parameter of the function definition enclosing the expression "plus x y" where it's referenced. -
lambda x y.y x
: in this expression both x and y are bound, because they are parameters of the function definition. -
lambda y . (lambda x . plus x y)
: In the inner lambda, "lambda x . plus x y
", y and plus are free and x is bound. In the full expression, both x and y are bound: x is bound by the inner lambda, and y is bound by the other lambda. "plus" is still free.
A lambda calculus expression is completely valid only when all of its variables are bound. But when we look at smaller subexpressions of a complex expression, taken out of context, they can have free variables - and making sure that the variables that are free in subexpressions are treated right is very important.
Lambda Calculus Evaluation Rules
There are only two real rules in lambda calculus; they're called alpha and beta. Alpha is also called "conversion", and beta is also called "reduction".Alpha Conversion
alpha is a renaming operation; basically it says that the names of variables are unimportant: given any expression in lambda calculus, we can change the name of the parameter to a function as long as we change all free references to it inside the body.So - for instance, if we had an expression like:
lambda x . if (= x 0) then 1 else x^2
We can do alpha changing X to Y (written "alpha[x/y]" and get):
lambda y . if (= y 0) then 1 else y^2
Beta Reduction
Beta reduction is where things get interesting: this single rule is all that's needed to make the lambda calculus capable of performing any computation that can be done by a machine.Beta basically says that if you have a function application, you can replace it with a copy of the body of the function with references to the parameter identifiers replaced by references to the parameter value in the application. That sounds confusing, but it's actually pretty easy when you see it in action.
Suppose we have the application expression: "
(lambda x . x + 1) 3
". What beta says is that we can replace the application by taking the body of the function (which is "x + 1"); and replacing references to the parameter "x" by the value "3"; so the result of the beta reduction is "3 + 1".A slightly more complicated example is the expression:
lambda y . (lambda x . x + y)) q
It's an interesting expression, because it's a lambda expression that when applied, results in another lambda expression: that is, it's a function that creates functions. When we do beta reduction in this, we're replacing all references to the parameter "y" with the identifier "q"; so, the result is "
lambda x. x+q
".One more example, just for the sake of being annoying:
"
(lambda x y. x y) (lambda z . z * z) 3
". That's a function that takes two parameters, and applies the first one to the second one. When we evaluate that, we replace the parameter "x" in the body of the first function with "lambda z . z * z
"; and we replace the parameter "y" with "3", getting: "(lambda z . z * z) 3
". And we can perform beta on that, getting "3 * 3".Written formally, beta says:
lambda x . B e = B[x := e] if free(e) subset free(B[x := e]
That condition on the end, "if free(e) subset free(B[x := e]" is why we need alpha: we can only do beta reduction if doing it doesn't create any collisions between bound identifiers and free identifiers: if the identifier "z" is free in "e", then we need to be sure that the beta-reduction doesn't make "z" become bound. If there is a name collision between a variable that is bound in "B" and a variable that is free in "e", then we need to use alpha to change the identifier names so that they're different.
As usual, an example will make that clearer: Suppose we have a expression defining a function, "
lambda z . lambda x . x+z)
". Now, suppose we want to apply it:(lambda z . (lambda x . x + z)) (x + 2)
In the parameter "(x + 2)", x is free. Now, suppose we break the rule and go ahead and do beta. We'd get:
lambda x . x + x + 2
The variable that was free in "x + 2" is now bound. Now suppose we apply that function:
lambda x . x + x + 2) 3
By beta, we'd get "3 + 3 + 2".
What if we did alpha the way we were supposed to?
by alpha[x/y]:
lambda z . (lambda y . y+z)) (x + 2)
by beta:
lambda y . y + x + 2) 3
by beta again:
3 + x + 2
."3+x+2" and "3+3+2" are very different results!
And that's pretty much it. There's another rule you can optionally add called Eta-reduction, but we'll skip that for now. What I've described here is Turing complete - a full effective computation system. To make it useful, and see how this can be used to do real stuff, we need to define a bunch of basic functions that allow us to do math, condition tests, recursion, etc. I'll talk about those in my next post.
We also haven't defined a model for lambda-calculus yet. (I discussed models here and here.) That's actually quite an important thing! LC was played with by logicians for several years before they were able to come up with a complete model for it, and it was a matter of great concern that although LC looked correct, the early attempts to define a model for it were failures. After all, remember that if there isn't a valid model, that means that the results of the system are meaningless!
11 Comments:
Lambda calculus is also a crucial tool for compositional semantics. I heard a story of one smeanticist who declared, "lambda calculus saved my life."
It was definitely my favorite part of semantics classes in grad school.
By Ed Keer, at 11:06 AM
I think lambda calculus is great and all, but the notation you've used there just looks like a random assortment of names and backslashes... is that what it's suppoesed to look like, or is it meant to encode for the lambda symbol and so on?
Running Firefox 1.5 (on Gentoo Linux), if that's relevant or useful to know.
By Anonymous, at 11:59 AM
ithika:
I'm experimenting with something called ASCIIMathML; it's a javascript that you reference in a web-page, and then anything enclosed in backwards quotes is parsed and translated into MathML. You must have Javascript turned on in firefox for it to work.
By MarkCC, at 12:14 PM
The MathML works fine for me on the page itself, but is broken in the RSS feed.
By Anonymous, at 1:24 PM
Ahh, that brings back days of programming in Scheme. Good times, good times.
By Frank, at 1:34 PM
" ...Beta reduction is where things get interesting: this single rule is that's needed to make the lambda calculus capable of performing any computation that can be done by a machine. ..."
There's something missing here. Should it be "...this single rule is ALL that's needed..."?
By Anonymous, at 4:44 PM
anonymous:
Yep, you nailed it. Missing word. I'll fix it.
By MarkCC, at 4:49 PM
Nothing here requires MathML. Just set the encoding to UTF-8 and use Unicode characters for symbols such as lambda.
(Please).
By David McCabe, at 7:11 PM
david:
While nothing here requires MathML, it seemed like a reasonable place to experiment. Some stuff really needs good math notation;this doesn't, so it was easy for me to back out when it didn't work. But I've had a lot of requests for stuff like topology, which I just can't imagine doing without proper math.
The first draft of this lambda-calc post used jpegs of equations; that was just unreasonably hard for me to use to put everything together. I really don't want to constantly invent my own notations because I need to do everything in text; and I don't want to go through the pain of using images for everything - so what alternatives do I really have?
By MarkCC, at 8:08 PM
Sorry, I don't understand, "Adding x plus y becomes writing a one-parameter function with parameter x, which returns another one parameter function which adds x to its parameter:
lambda x.(lambda y. plus x y)"
Also, I think some of your examples are missing left parentheses.
By Anonymous, at 12:09 AM
Jane:
It's a bit of a confusing concept at first. Let me try to explain it again.
Remember that functions *are* values. So a function can return a function. That's a key thing.
Now, suppose we want to write a program to add 3 to its parameter. That would be something like:
lambda y . 3 + y
Now, what if we wanted to let someone give us a value x, and give them back a function that adds x, like the last example adds 3? That's a function that returns a function:
lambda x . (lambda y . x + y)
So if we wanted a function that adds 13, we could just apply that to 13:
(lambda x . (lambda y . x + y)) 13
Now, we beta:
lambda y . 13 + y
So - that function *is* the general add function. We can treat it like it's a two parameter function:
(lambda x . (lambda y . x + y)) 3 4
beta: (lambda y . 3 + y) 4
beta: 3 + 4
Is that clearer?
By MarkCC, at 8:26 AM
Post a Comment
<< Home