Good Math/Bad Math

Monday, May 08, 2006

Context Sensitive Languages

Last week, I started talking about computation and languages. I've already talked about level 3, the regular languages, and level 2, the context free languages. So, here we are at Chomsky level 1, the context sensitive languages.

Context sensitive languages are interesting because they are the closest formal approximation to what we can do with real, practical computers. They're also just fun, because they're capable enough to do all sorts of fascinating things, but simple enough the be reasonably easy to play with.

Representing Context Sensitive Languages

Context free languages can be represented using phrase-structure grammars, just like the other grammars we've discussed. The new expressive abilities of context sensitivity can be described in two very nearly equivalent ways. I prefer one of them for explaining why we call these things context sensitive; and I actually prefer the other for writing interesting grammars.

The first one is the classic context sensitive formulation: in productions, the left-hand side of the production can be any sequence of symbols, both terminals and non-terminals; on the right-hand side of the production, only one non-terminal can be replaced a sequence of symbols. So one non-terminal is being replaced in the production; the other symbols on the left-hand-side establish a fixed, unchangeable context in which the expansion is permitted to occur.

So, for example, if lowercase letters are non-terminals, and uppercase symbols are terminals, in a CSG I can write rules like:

x ::= ABxC
ABxC ::= ABCyC
AxYbC ::= ABYbC

But, I can't have rules like:

AxYbC ::= ABYQC (replaced two-nonterminals)
AxYbC ::= BxYbC (replaced a terminal)
AxYbC ::= AxYxx (changed the context by altering the terminal after "b")

Context sensitive rules are also sometimes written in a slightly different format, which can be much clearer. This format leaves the basic production part looking the same as in the CFGs, but adds an "in" clause to specify the context in which the production can be used. So in this format, the productions above would be written:

x ::= ABxC (Don't need to show any context)
x ::= Cy in ABxC
x ::= B in AxYbC

One of the advantages of this alternate way of writing context sensitive rules is that it prevents you from accidentally writing invalid rules: try to write one of the invalid examples above in this format - it doesn't work.

The other formulation of context sensitive grammars is simpler, but I find that it makes the "context sensitivity" aspect a little more difficult to see. The second formulation is nearly equivalent to the first; the only difference is that the CSG formulation above can be used to specify languages that include the empty string; this one can't. Aside from that, anything that can be written in CSG form above can be written in this form, and vice versa.

The second formulation is called non-contracting grammars (NCG). In a non-contracting grammar, the only restriction is that the left-hand-side of every grammar rule cannot be longer than the right hand side. So the side of the string being generated by production rules is monotonically increasing.

Here's a nice example of a NCG doing something that a CFG can't, and which illustrates why NCGs are easier to write than CSGs.

a ::= XYZ (1)
a ::= XabZ (2)
Zb ::= bZ (3)
Yb ::= YY (4)

Rules 1 and 2 basically set up the pattern to generate the matching numbers of Xs, Ys, and Zs; but it uses the non-terminal "b" instead of "Y" for all but the first "Y". Rule 3 does the re-ordering to move all the "Z"s to the end (note that we could not have this rule in a CSG). And finally, rule 4 converts the non-terminal "b"s to "Y"s. To see how it works, it helps to follow it through a few steps. So let's generate "XXXXYYYYZZZZ". We start with "a".
  • Rule 2(replace "a" by "XabZ") : a -> XabZ
  • Rule 2 again: XabZ -> XXabZbZ
  • Rule 3 (replace "Zb by bZ"): XXabZbZ -> XXabbZZ
  • Rule 2 again: XXabbZZ -> XXXabZbbZZ
  • Rule 3: XXXabZbbZZ -> XXXabbZbZZ
  • Rule 3: XXXabbZbZZ -> XXXabbbZZZ
  • Rule 1(replace "a" by "XYZ"): XXXabbbZZZ -> XXXXYZbbbZZZ
  • Rule 3: XXXXYZbbbZZZ -> XXXXYbZbbZZZ
  • Rule 3: XXXXYbZbbZZZ -> XXXXYbbZbZZZ
  • Rule 3: XXXXYbbZbZZZ -> XXXXYbbbZZZZ
  • Rule 4 (replace "Yb" by YY): XXXXYbbbZZZZ -> XXXXYYbbZZZZ
  • Rule 4 again: XXXXYYbbZZZZ -> XXXXYYYbZZZZ
  • Rule 4 one last time: XXXXYYYbZZZZ -> XXXXYYYbZZZZ
Doing this in a CSG rather than a NCG is possible, but not so simple. (For example, I found on variant on the net which has 13 rules, and is using a less restrictive definition of CSGs than the one above!) This little 4 rule NCG is a lot easier to understand, and a lot easier to write.

Level One Computing Devices

For level 1, the computing device is a bounded tape turing machine. As a reminder, I described turing machines earlier. A normal turing machine has as much tape as it wants to work with; the tape is however long it has to be, even if that means infinite. A bounded tape turing machine has a finite amount of tape; if it exceeds the amount of tape it's given and tries to move the head past the end, it halts and rejects the input string. The machine doesn't have to have a fixed input tape length for all inputs: for a language to be computable on a bounded-tape turing machine, you need to be able to select the length of tape that you will need to perform an acceptance calculation given only the length of the input.

If the machine is non-deterministic, then the amount of tape needed for a given input is a simple fixed multiple of the size of the input (the multiplier varies by machine/program). We call that machine a linear-bounded automaton (LBA). (IN my experience, LBA always means non-deterministic; to specify a linear bounded deterministic machine, it's LBDA.) If you're working with a deterministic machine, then the amount of tape needed for a given input can be calculated using some polynomial in the length of the input; the specific polynomial is dependent on the details of the machine/program.

Just for the heck of it, here's a very common example of a function that you cannot compute using a bounded automaton of any kind. It's called Ackermann's function, and it's an interestingly evil little bugger. It takes two integers as input. We'll call it A(m,n):
  • A(m,n) = n + 1 if m = 0
  • A(m,n) = A(m-1,1) if m > 0 and n = 0
  • A(m,n) = A(m-1,A(m,n-1)) if m > 0 and n > 0
Looks trivial, doesn't it?

But if you actually try to work it out, A(4,2)= 2^(2^(2^2)) - 3, which is 2^65536 - 3, which equals rough 2x10^19729. A(4,3) = 2^(2^(2^(2^2))) - 3, which is a number with more digits than there are atoms in the universe.

You can't bound it. You can't hope to bound it. Just computing the size of it isn't boundable. It's just evil.


  • I'm confused:


    = 2 ^ (2 ^ (4))

    = 2 ^ ((16))


    It does NOT equal "2^65536" unless you missed an exponent somewhere.

    Hmm, let me see if I can figure it out.

    A(4,2) = A(3,(A(4,1))
    A(4,1) = A(3, (A(4,0))
    A(4,0) = A(3,1)
    A(3,1) = A(2, A(3,0)
    A(3,0) = A(2,1)
    A(2,1) = A(2, A(1,0)
    A(1,0) = A(0,1)
    A(0,1) = 2

    Hmm. I can't easily identify which numer you dropped.

    Big numbers. heh.

    By Blogger sailorman, at 1:08 PM  

  • A(4,2) = A(3,(A(4,1))
    A(4,1) = A(3, (A(4,0))
    A(4,0) = A(3,1)
    A(3,1) = A(2, A(3,0)
    A(3,0) = A(2,1)
    A(2,1) = A(2, A(1,0)
    A(1,0) = A(0,1)
    A(0,1) = 2

    OK, what am I missing?

    I see from the above progression that A(4,2) turns into A(3,2).

    What I don't see is where you're getting the exponents, as the function appears to constantly decrease and there's no multiplier I can see...

    By Blogger sailorman, at 1:11 PM  

  • sailorman:

    Yeah, I made a typo and missed a level of exponents.

    For the progression, it's pretty nightmarish - because the only thing that moves the numbers is the "+1" up in the top clause.

    Here's a bit of the progression:

    A(4,2) = A(3,A(4,1))
    A(4,1) = A(3,A(4,0))
    A(4,0) = ... = 13
    A(4,1) = A(3,13)
    A(3,13) = A(2,A(3,12))
    A(3,12) = A(2,A(3,11))
    A(3,11) = A(2,A(3,10))
    A(3,3) = A(2,A(3,2))
    A(3,2) = A(2,A(3,1))
    A(3,1) = A(2,A(3,0))
    A(3,0) = A(2,1)
    A(2,1) = 5

    So A(3,1) = A(2,5) = 13
    A(3,2) = A(2,13) = 29
    A(3,3) = A(2,29) = 61
    A(3,4) = A(2,61) = 125
    A(3,5) = A(2,125) = 253
    A(3,6) = A(2,253) = 509
    A(3,7) = A(2,509) = 1021
    A(3,8) = A(2, 1021) = 2045
    A(3,11) = 16381

    See the exponential growth going on? Each iteration as n reduces does additions to generate a further power of 2; each iteration of m raises 2 to the power of the iterations of n. The growth is incredible, and it's all by nothing but massive additions of 1.

    By Blogger MarkCC, at 1:54 PM  

Post a Comment

<< Home