### Computer Science, Math, and Languages

Computer science is, I think, more of a branch of math than of science. The math behind computer science ultimately comes down to two things:

Anyway: One of the ways that the two kinds of fundamental CS come together is in the idea of

The idea of computable languages is that given a problem that we wish to solve, it can always ultimately be described as a

For example: there's a language which consists of all strings of any number of "a"s followed by any number of "b"s. We'd usually write that "a*b*". There's language consist of strings of nested parenthesis, like "(((())()())(()))". There's a language made from strings of digits that consists of the set of all prime numbers. And so on.

We can ultimately describe any computation problem in terms of language; this reformulation of problem as language can change the complexity, but not the fundamental nature of what we can and can't do. To give an example of expressing a problem as language, we can describe addition as a language of patterns like "x+y=z", where the string is part of the language if the addition statement is true.

All formal languages can be described using

A formal grammar consists of:

What this statement means is when you have a string containing the NTS "a", you can replace it with the string "XYbZ" - that is, the terminals "XY", followed by the non-terminal "b", followed by the terminal "Z".

Here's an example of a complete grammar specifying the set of strings with equal numbers of Xs and Ys, with the Ys after the Xs. The terminals are "X" and "Y"; the only non-terminal is "a", so "a" is obviously the start symbol.

This says that if you see an "a", you can replace it with either "XaY", or just "XY". Let's look at how we can use this to create a string of four Xs followed by four Ys:

The four groups are arranged in levels, and the organization is called the

To briefly describe them (I'll describe them in more detail later, one post each), the four levels are:

Accepting the input means that if the input is a member of the language, it will eventually stop and say "Yes, this is in my language". Deciding about the input means that that the machine will eventually stop and say either "Yes, this is in my language", or "No, this is not in my language". The difference is that an accepting machine

There is a set of languages

- (1) What can we do?/What can we
*not*do? - (2) If we can do it, how long will it take?

Anyway: One of the ways that the two kinds of fundamental CS come together is in the idea of

*computable language*from automata theory. Computable languages are a way of specifying*what*a given type of computing device can do; and from the type of computing device, we can generate rough complexity bounds on how long a given task needs to take. What's particularly cool/fun about talking about computable languages is that it's both incredibly deep and fascinating; and yet, it's also very concrete. The concreteness comes about because we can "design" different kinds of machines - very concrete specific machines - which are capable of doing the computations expressed by the languages.The idea of computable languages is that given a problem that we wish to solve, it can always ultimately be described as a

*language*. The kind of language that I'm talking about here is purely syntactic: it is a set of strings in some alphabet selected by a predicate. It's important to understand that the predicate is*external*: the string itself has no meaning; some*program*or some machine examining the string can determine if it is a member of a language: the meaning of the string is imposed*by the machine*, not by the string.For example: there's a language which consists of all strings of any number of "a"s followed by any number of "b"s. We'd usually write that "a*b*". There's language consist of strings of nested parenthesis, like "(((())()())(()))". There's a language made from strings of digits that consists of the set of all prime numbers. And so on.

We can ultimately describe any computation problem in terms of language; this reformulation of problem as language can change the complexity, but not the fundamental nature of what we can and can't do. To give an example of expressing a problem as language, we can describe addition as a language of patterns like "x+y=z", where the string is part of the language if the addition statement is true.

All formal languages can be described using

*grammars*. A grammar is a set of rules describing how a string in the language can be produced. The kind of grammar that I'm using is called either just "formal grammar", "phrase structure grammar", or Chomsky grammar.A formal grammar consists of:

- An
*alphabet*of characters, also called*terminals*or*terminal symbols(TSs)*. - A set of symbols distinguishable from the terminals, called
*non-terminal symbols (NTSs).*. - One symbol from the set of non-terminal symbols which is called the
*start symbol* - A set of
*replacement rules*which specify how to generate strings in the language by replacing one substring containing a non-terminal symbol with another substring.

*most*grammars that you typically see, the left-hand side is always just a single non-terminal symbol. So, for example, a grammar with non-terminals "a" and "b", and terminals "X", "Y", and "Z" could contain a rule like:

a ::= X Y b Z

What this statement means is when you have a string containing the NTS "a", you can replace it with the string "XYbZ" - that is, the terminals "XY", followed by the non-terminal "b", followed by the terminal "Z".

Here's an example of a complete grammar specifying the set of strings with equal numbers of Xs and Ys, with the Ys after the Xs. The terminals are "X" and "Y"; the only non-terminal is "a", so "a" is obviously the start symbol.

a ::= X a Y * (rule 1) *

a ::= X Y * (rule 2) *

This says that if you see an "a", you can replace it with either "XaY", or just "XY". Let's look at how we can use this to create a string of four Xs followed by four Ys:

- Start with the start symbol, "a":
`a`

- Apply rule 1, replacing "a" with "XaY":
`XaY`

- Apply rule 1 again:
`XXaYY`

- Apply rule 1 again:
`XXXaYYY`

- Apply rule 2, replacing "a" with "XY":
`XXXXYYYY`

- Since there are no non-terminals left, we're done.

The four groups are arranged in levels, and the organization is called the

*Chomsky heirarchy*. (And yes, it's the same Chomsky who currently spends his time on wacky politics. Back in the '50s, he did great work as a linguist. It's a damn shame he didn't stay in linguistics.)To briefly describe them (I'll describe them in more detail later, one post each), the four levels are:

- Level 3: the
*regular*languages. These are very simple languages, which can't do anything particularly interesting: no counting, nothing beyond sequencing and repetition. Regular languages can be recognized by a simple kind of machine called a*finite state automaton*. - Level 2: the
*context free*languages (CFLs). These are widely used in practical computer science. They're reasonably expressive; you can do a limited kind of counting with them, but nothing too deep. CFLs can be recognized by a machine called a*push-down automaton*. - Level 1: the
*context sensitive*languages (CSLs). Context sensitive languages are the things that can be written a more advanced kind of grammar. You can express a*lot*of quite deep computation using CSLs. CSLs can be recognized by a very interesting machine: a linear-bounded tape*non-deterministic*turing machine. Level 1 includes most things that we can really compute: the linear-bounding non-deterministic machine is a pretty good match for the capability of a real computer; level 0 conceptually requires infinite memory. - Level 0: the recursively enumerable languages. Anything computable. Level 0 languages are languages which can be
*accepted*by a regular, unbounded turing machine.

*accept*the input; or it can*decide*about the input. Note that in level 0 above, I said*accept*.Accepting the input means that if the input is a member of the language, it will eventually stop and say "Yes, this is in my language". Deciding about the input means that that the machine will eventually stop and say either "Yes, this is in my language", or "No, this is not in my language". The difference is that an accepting machine

*does not*necessarily stop if the string is not in the language. It*might*be able to stop and say no sometimes, but there is no guarantee.There is a set of languages

*more complex*than the context sensitive languages which is*decidable*by a turing machine. These are called the*recursive languages*.
## 5 Comments:

Cool post.

"Theory of Computability" was one of my favorite classes as an undergrad and formal languages are a bit of a hobby.

Are you going to delve into "abacus computable" any time soon? That was the one of the three types of computability that gave me the most trouble.

By ArtK, at 7:32 PM

(And yes, it's the same Chomsky who currently spends his time on wacky politics. Back in the '50s, he did great work as a linguist. It's a damn shame he didn't stay in linguistics.)Don't mean to be persnickity, but this statement would be true if Chomsky weren't still doing linguistics. But he is - and he's been doing wacky politics for just as long as he's been doing linguistics. It's a damn shame more people don't know about his more recent work (

The Minimalist Programfor instance).By jayinbmore, at 8:16 PM

Oh no. Mr Good Math, Bad Math is yet another one of those people who think Math is not a science. See the shtetl-optimized blog for a discussion on this subject.

By Anonymous, at 12:04 PM

Math isn't a science. Computer science is, however, because computer science is experimental math.

Anyway, anyone finds this stuff interesting might like "Godel, Escher, Bach" or "The Lady and the Tiger", by Hofstadter and Smulleyman, respectively. GEB is about computability, and TLaT is about Godel's theory through logical puzzles. They are both quite beautiful.

By Seth, at 12:42 AM

Could you kindly give some examples of

Allender's thoughts from the ... perspective of automata theory (Allender ... who taught from the perspective of automata theory.)

By gbertellini@gmail.com, at 9:41 AM

Post a Comment

## Links to this post:

Create a Link

<< Home