Practical Applications of Good Math: Type Checking in Programming Languages
    
      (Since I see that there are still links pointing at this post, I'll point out here that this blog has moved to scienceblogs. This discussion was continued in a post there.)
I thought today I'd go a bit off topic, and rant a little bit about stuff that's been bugging me lately in my real work. It does, in its bizzare way, relate to the lambda calculus series I've been writing lately.
What I do professionally is research in a field called collaborative software development: how to build tools that help groups of people work together to build large systems. It's an interesting thing to do, with a huge range of things that can be done, ranging from better programming languages to better configuration management systems to better programming environments to entirely new kinds of collaborative tools. It involves practical programming, math, psychology and human factors, and a few other interesting things.
My approach to this kind of work is to view the key problem as being one of communication. Most problems in large systems are ultimately not algorithmic errors - they're errors where some key piece of information was not correctly communicated between developers. For example, one of the most common bugs in large systems occurs when the implementor of a piece of code makes assumptions about how their code is going to be called; and the developer who uses that code makes different assumptions. To be concrete; I experienced this recently on my project at work. I'm working on a bit of code, and one function I call takes a lot of parameters, each of which is an array of objects. Most of the time, you only supply a real value for one or two out of a list of 8 parameters. I assumed that for unused parameters, it made sense to pass null - that's how I would have written that method. The person who wrote it expected callers to never pass null, but to use empty arrays for unused parameters. Program compiles, deploys, and boom: exceptions out the wazoo. (There was a dynamic check for null, but because of some proxy stuff we do, it was very hard to track it down.)
This isn't a result of incompetence on anyone's part. This project has some of the finest developers that I've ever met working on it. The person who wrote that method with the awful parameter list is actually a pretty great programmer - one of the top people from one of the top groups of developers that I know of. I don't think I'm incompetent either :-). So why did the error happen?
Because a key piece of information didn't get passed between us. Between the size of our system, the number of people involved, the way the code is partitioned into components, and where the documentation lives, the information about what to do about empty parameters got lost.
What does any of this have to do with lambda calculus? Types.
That problem wouldn't have happened if we were programming in my favorite programming language, OCaml. In OCaml, to be able to pass a "null" value, I actually need to specify that in the parameter declaration (as an Option type). There's no way I can pass a null to a function that wasn't expecting one.
It's a common assertion among a lot of the elite hackers out there that "real programmers don't use strongly typed languages", or the Paul Graham line about strongly typed languages. For example, here's Paul on Java:
Sure, strong typing in a language is no replacement for testing. But why on earth should I write tests to verify correct typing, when I can just put the type information in line in the code and have it tested automatically? Typing and type checking is one part of the process; having a typed language doesn't mean you shouldn't write tests; writing tests doesn't mean that compile-time type checking is useless. Pulling the typing information out of line into a test is, frankly, just dumb: the more you spread the information around between multiple places, the less likely it is to be maintained, or even found.
I'm a huge unit testing fan; but one of the major limitations of testing is information scattering. You're putting information about the invariants and requirements of your code in the logic of the tests - and because of the multidimensional nature of software systems, that inevitably means that some of the information is here, some there. Trying to assemble the information you need from a list of test failures can be incredibly frustrating; and those tests often don't get maintained, so things change, but the tests aren't changed to thoroughly respect the new semantics. Compile time checking can't get out of synch with the current state of the code.
    
    
  
  I thought today I'd go a bit off topic, and rant a little bit about stuff that's been bugging me lately in my real work. It does, in its bizzare way, relate to the lambda calculus series I've been writing lately.
What I do professionally is research in a field called collaborative software development: how to build tools that help groups of people work together to build large systems. It's an interesting thing to do, with a huge range of things that can be done, ranging from better programming languages to better configuration management systems to better programming environments to entirely new kinds of collaborative tools. It involves practical programming, math, psychology and human factors, and a few other interesting things.
My approach to this kind of work is to view the key problem as being one of communication. Most problems in large systems are ultimately not algorithmic errors - they're errors where some key piece of information was not correctly communicated between developers. For example, one of the most common bugs in large systems occurs when the implementor of a piece of code makes assumptions about how their code is going to be called; and the developer who uses that code makes different assumptions. To be concrete; I experienced this recently on my project at work. I'm working on a bit of code, and one function I call takes a lot of parameters, each of which is an array of objects. Most of the time, you only supply a real value for one or two out of a list of 8 parameters. I assumed that for unused parameters, it made sense to pass null - that's how I would have written that method. The person who wrote it expected callers to never pass null, but to use empty arrays for unused parameters. Program compiles, deploys, and boom: exceptions out the wazoo. (There was a dynamic check for null, but because of some proxy stuff we do, it was very hard to track it down.)
This isn't a result of incompetence on anyone's part. This project has some of the finest developers that I've ever met working on it. The person who wrote that method with the awful parameter list is actually a pretty great programmer - one of the top people from one of the top groups of developers that I know of. I don't think I'm incompetent either :-). So why did the error happen?
Because a key piece of information didn't get passed between us. Between the size of our system, the number of people involved, the way the code is partitioned into components, and where the documentation lives, the information about what to do about empty parameters got lost.
What does any of this have to do with lambda calculus? Types.
That problem wouldn't have happened if we were programming in my favorite programming language, OCaml. In OCaml, to be able to pass a "null" value, I actually need to specify that in the parameter declaration (as an Option type). There's no way I can pass a null to a function that wasn't expecting one.
It's a common assertion among a lot of the elite hackers out there that "real programmers don't use strongly typed languages", or the Paul Graham line about strongly typed languages. For example, here's Paul on Java:
It's designed for large organizations. Large organizations have different aims from hackers. They want languages that are (believed to be) suitable for use by large teams of mediocre programmers-- languages with features that, like the speed limiters in U-Haul trucks, prevent fools from doing too much damage.This attitude really bugs me. The thing is, types are meta-information about a program - and useful meta-information. And the more people you have involved in a system, the more important it becomes to have strong, formal information about parts of the system that allow automatic checks.
-  The elitism issue. "Yeah, I'm a super-duper programmer who doesn't make mistakes; people who don't work the way I do are just inferior." Yeah, tell that to the guys who wrote OCaml.
-  The cluelessness issue: "I write programs in Perl/Python/CommonLisp/... with three other guys who are all lisp wizards, and if we don't have a problem, then no one should, so that typing stuff is just nonsense to cover up for incompetence." There are problems that emerge as the size of a system and the number of people working on it increase. Three people hacking on something is very different from 30 people hacking on it, which is very different from 300 people. As the number of people involved increases, you need to do more to ensure that information gets to where it's needed.
-  The information loss issue. This is the one that actually bugs me the most. Programmers know what types they're using. They know what types they're passing. They know information about the parameters they're taking, and the parameters that they're passing. Good type system give them a way to write that down - and not just write it, but make it a part of the code, so that it can be tested. Writing it down as a part of the code means that the information is recorded; that it's maintained; and that it's checked. Not declaring types saves you a bit of keyboard time - but it means that there's important information in your head which you're not writing down. (I have exactly the same rant about parallelizing compilers, which is that you have to structure your code very carefully and precisely to allow the compiler to figure out the information that you knew before you wrote the code; but instead of letting you write it down, you have to trick the compiler into figuring it out on its own.)
Sure, strong typing in a language is no replacement for testing. But why on earth should I write tests to verify correct typing, when I can just put the type information in line in the code and have it tested automatically? Typing and type checking is one part of the process; having a typed language doesn't mean you shouldn't write tests; writing tests doesn't mean that compile-time type checking is useless. Pulling the typing information out of line into a test is, frankly, just dumb: the more you spread the information around between multiple places, the less likely it is to be maintained, or even found.
I'm a huge unit testing fan; but one of the major limitations of testing is information scattering. You're putting information about the invariants and requirements of your code in the logic of the tests - and because of the multidimensional nature of software systems, that inevitably means that some of the information is here, some there. Trying to assemble the information you need from a list of test failures can be incredibly frustrating; and those tests often don't get maintained, so things change, but the tests aren't changed to thoroughly respect the new semantics. Compile time checking can't get out of synch with the current state of the code.



 RSS
RSS
27 Comments:
Amen!
I was guilty of the "real programmer" fallacy ("no true Scotsman?") for a brief time when I started out. Because, of course, real programmers write in assembly language (S/360 BAL) and not this high-level crud.
I learned, quickly how wrong that was.
My contention is that it doesn't even depend on the number of developers or the size of the project. Time plays a factor, too. I've actually cursed about some idiot who developed a routine and did something stupid, only to realize that it was me, in earlier years.
On testing vs typing: There's a matter of ROI, too. I can spend 30 seconds specifying a type properly, or 30 minutes writing a test case to verify it. That's 30 minutes lost that could have been spent on something that isn't verifyable any other way.
By Anonymous, at 3:11 PM
 Anonymous, at 3:11 PM
	   
artk:
WRT to the ROI thing: exactly right. If you're in the testing school, you can look at type declarations as just a very cheap mechanism for putting a bunch of tests inline.
I also agree that strong typing is good even for single-developer stuff. Certainly in my own programming, when I want a good prototype, I often pull out OCaml.
The bulk of the implementation of my PhD thesis was in Scheme, and I've definitely been in the situation of cursing myself for that: I came back to it four years after I graduated, and wanted to port it to a different platform... I ended up giving up; too many errors from type changes! (The implementation needed to be switched from the socket implementation on STk for Solaris to winsock sockets on a windows scheme.)
By MarkCC, at 3:36 PM
 MarkCC, at 3:36 PM
	   
You're mischaracterizing the strengths of dynamic typing. From your description, of course it's a pure win: if you have some meta-information, and the computer could automatically do a type check with it, then why not provide it?
Nobody, even Paul Graham, disagrees with that.
But the trick is whether the type information is optional or required. "Strong typing" languages generally require type information everywhere (at least inferrable). If they are unable to prove that a program is type-safe, they refuse to compile it. It turns out that there are a large class of source codes that are not provably type-safe, but at the same time are useful to run (especially during rapid prototyping).
Strong typing languages insist that type consistency be resolved before ever running a piece of code. That may not be the most important factor in a current programming task, and it is infuriating to have the compiler force you to allocate your mental effort in a place that (at some given time) may have little benefit.
By contrast, Common Lisp (a dynamic typing language) doesn't require any type information, but it DOES optionally permit the recording of type information, if that is of benefit for a given programmer in a given piece of code at a given time.
Finally, let me conclude by saying that your use case (300 programmers on a single massive enterprise-level project) is dramatically different from the use case that Paul Graham explores (a handful of super-programmers working on problems that have never been solved before). You ought to acknowledge that different features of the programming system might have different levels of importance for the different use cases.
By Anonymous, at 4:10 PM
 Anonymous, at 4:10 PM
	   
Don -
Both Mark and I raised the point that, although problems become very apparent with large scale projects, strong typing can address issues that arise with any project, down to an individual coder.
As a metaphor, an interface is a contract. Enforcing a contract is always more expensive the later you wait. Where you and I would disagree, I think, is in the idea that early enforcement isn't always an immediately important task. I believe that it is.
I don't subscribe to any notion of a "super-programmer." If I did, I'd be inclined to say that if they are being "infuriated" by the compiler enforcing strong typing, then they aren't a super-programmer, because a super-programmer would get the typing right the first time.
Sarcasm aside, I disagree with the notion that getting the typing right requires any exceptional mental effort. In fact, getting it right is what I expect from a programmer fresh out of the box.
Bob--
I'm at 35 years and see the same things.
By Anonymous, at 5:40 PM
 Anonymous, at 5:40 PM
	   
In the case of Python (version 2.4 and later), you can do argument type validation simply via decorators, which can (with minimal work), modify docstrings (and thusly documentation) to specify the expected argument types and/or returns; giving you run-time type checking along with sufficient documentation to solve this particular problem.
By Anonymous, at 11:04 PM
 Anonymous, at 11:04 PM
	   
As a general comment, I want to stress that my main personal interest in this is in my role as a researcher; and that's focused on how people interact and communicate with one another when they're working collaboratively on a large system. And in that setting, my own issue isn't necessarily even whether strong typing results in fewer bugs - it's whether or not strong typing has an impact on intra-group communication. My contention is that it *does* facilitate the flow of important information between members of a group; and that leaving out the type information is a process of leaving valuable information from the mind of the developer out of the code.
I do agree with bob and artk that strong typing is a useful thing in other contexts as well. I particularly agree that any "super-programmer" should be able to get the types right right away. My own experience with very strongly typed languages (SML, OCaml, Haskell, and (yes) Ada) is that when you first start using them, it can seem awkward and frustrating to have to state that information up front; but in a very short time it becomes natural. When I first started using Ocaml, it took me about two weeks to get comfortable with it; since then, I often write OCaml type specifications while I'm writing Java code, because it helps me to make sure that I'm really clear on everything I'm doing.
By MarkCC, at 9:10 AM
 MarkCC, at 9:10 AM
	   
From the perspective of research on large group collaboration, I have no objection. That's a tough nut, and many of the problems involve communication issues (as you've said), so tools that assist with communication and documentation are probably good.
But that is not the only programming situation. I think some of you have misinterpreted my "super-programmer" as a slight on the quality of the 300 programmers in the mentioned team. The real question is, how well-understood is the problem (and the solution), and how much investigation and design does the programmer do (as opposed to "just" coding).
Take a lone computer science professor, working on some AI or graphics or encryption problem. Something they don't know how to solve. In this case, the computer program is not "merely" an expression of the correct algorithm. Instead it is a collaborative tool for rapid prototyping, where the programmer implements a large number of one-off pieces of code, in order to investigate the problem domain and learn more about it.
The coding pattern you folks seem to be talking about is one where the design in essense occurs offline, and then the code is basically the formal specification of agreements (like interfaces). You've completely left out the part of programming where the solution is not yet well understood, and where coding of partial, incomplete programs is part of the process of understanding the problem domain.
In this latter situation, experience seems to have shown that strong/static typed programming languages force the programmer to worry about the wrong things at the wrong times. It enforces a certain methodology to programming, which is not ALWAYS (although may sometimes be) the best approach early. Moreover, the added overhead (at times) doesn't buy much benefit (in terms of reduced bugs, etc.).
All I'm trying to suggest is that there are two sides to this complex issue, and you're not doing it justice to assert that static typed languages are universally superior, such that only ignorant uneducated programmers would prefer dynamic typing. The truth is far more complex, and (like most things in programming language design) this is a matter of tradeoffs. There are strong arguments on both sides, not just on the static typing side.
By Anonymous, at 12:09 PM
 Anonymous, at 12:09 PM
	   
Don:
I have not said or intended to suggest that anyone who prefers dynamic languages is stupid or ignorant. In fact, I'm trying to say exactly the opposite: that many of the non-typed advocates, like Paul Graham, ignore the benefits of typed languages and characterize typed language proponents as being stupid, ingorant, and/or incompetent.
I use scheme fairly regularly - the only code that I've actually used for this blog has been written in Scheme. I *do* get the benefits of dynamic typing. I just think that there's another side to the story: that strong typing has its benefits too, and that those benefits extend well beyond the spackling-over-mediocrity caricature
that is often invoked against them.
And actually, I even disagree with you about whether strong typing can be a benefit at the prototyping stage.
My previous project at work was an SCM system called Stellation, which we eventually took open-source via eclipse.org.
Stellation started out as a one person project - me, in my office, trying to figure out what to do about the kinds of problems I was witnessing in development teams.
The very first prototyping work that I did was in OCaml. And the type system was *very* helpful, even at that incredibly early stage, where I had no clue of what I was doing. I spent about 6 months doing throwaways in OCaml, before I switched to Java to build the real system. (I had to use Java for political reasons.)
What I got out of using OCaml at that stage was a clear idea of where my ideas were unclear. That is, as I was figuring things out, I often thought that I knew what to do with certain parts of the system; but when I tried to write them down in OCaml, I realized that I didn't understand them enough to be able to actually specify their full types. At that stage, had I been using Scheme, I could have written some sketchy code in Scheme without realizing that there was a lack of understanding that I was skipping over; with Caml, I always knew what I really understood, and what I didn't understand yet.
By MarkCC, at 12:38 PM
 MarkCC, at 12:38 PM
	   
Perhaps I misunderstood your comments vis-a-vis Paul Graham, but it seems like your trying to conflate dynamic typing with weak typing. I thought the key qualification for a language being strongly typed was that all type errors are caught as such. No restrictions are imposed on when the type errors are detected (Runtime vs. compile time for example).
By Anonymous, at 12:43 PM
 Anonymous, at 12:43 PM
	   
Anonymous: You're right that "strong typing" and "static typing" could be different concepts. But the benefits alluded to in the original post require compile-time checking of type consistency. It doesn't do any good to catch the empty-array-vs-null error that was mentioned by throwing a runtime error. Once the software has been deployed, there's really not much difference between that and just dumping core.
So the debate is really compile-time type inference/checking vs. run-time dynamic types.
By Anonymous, at 1:16 PM
 Anonymous, at 1:16 PM
	   
Mark:
You write: "What I got out of using OCaml at that stage was a clear idea of where my ideas were unclear."
I've never tried to argue that compile-time type consistency has no value.
The question is whether it is a good design (of a programming language) to force the programmer to always have a (provably!) type consistent program before running it.
The alternative isn't the absence of type checking, which would prevent the benefits you've observed. The alternative is OPTIONAL type checking. If you find it useful, at some point in time, to fully specify the type pattern of a set of code, by all means please do it.
But the static typing folks go farther. They basically deny that it is EVER useful to postpone this effort. And that is where they err. There are plenty of programming situations where getting (provable!) type consistency is not the most important next step.
Not the least of which, because our type inference systems are not omniscient, and a given piece of code might be type-safe but the compiler might be unable to figure that out on its own (without additional work by the programmer).
By Anonymous, at 1:23 PM
 Anonymous, at 1:23 PM
	   
Where type information does *not* belong is the name of the variable!
I am looking at you Microsoft!
There, I feel better now.
By Anonymous, at 1:59 PM
 Anonymous, at 1:59 PM
	   
Bob:
If [static] typing reduces the probability of errors and makes them easier to find, I want it. It saves me time.
Only if you make sufficient errors of this kind, which are caught automatically and which you would not otherwise catch, to make up for needing to make EVERY piece of code type-safe.
It's somewhat like the 80/20 rule of premature optimization. It's good to optimize the parts of the code where the CPU is spending its time. It's silly to optimize all of your code before you know what the bottlenecks are.
Being a programmer, I'm also intrinsically lazy. If type specification is optional, I might well neglect to do it, because I don't always do what's best for me. Shame on me.
So you're telling me that you're an experienced programmer, decades of experience, millions of lines of code, you know the "right" way to program (static typing), and if you were using a programming language that permitted this (but did not require it) that you would be too lazy to do it? That you prefer the language FORCE EVERYBODY to do static typing ALL THE TIME, because you can't be bothered to do it when it's appropriate?
If true, then this attitude is exactly the difference between "regular" programmer and "super" programmers. For the super programmer, the language is a tool, and the programmer is the master. Your paternalistic attitude towards programming languages is stifling for a more creative / confident programmer.
(I know I wrote that as deliberately argumentative. I don't actually mean to accuse you of anything, but I exaggerated it to make a point.)
To my mind, you're suggesting that someone can write a literary essay quickly by leaving out the adjectives, "get it right," and then go back and put them in later.
On the contrary, static typing is NOT necessary for perfectly fine running programs (unlike adjective in sentences). It is merely one tool among many to help with the communication between human and machine that we call programming. It has some pros and cons, and you may find it valuable, but it's hardly a required part of every good programming practice.
By Anonymous, at 5:44 PM
 Anonymous, at 5:44 PM
	   
Bob wrote: I'm not going to respond to that kind of argument.
Then ignore the paragraph that you don't like, and respond to the rest of the post.
To reiterate, the points were:
1. You said static typing might reduce the probability of errors, and thus would save you time. I made the point that, even if it does reduce errors (debatable), it would only save you time if it took less time to fully specify types than you gain by finding a few more bugs. This is not at all obvious.
2. You said that static typing is best, but if it was optional you might not do it (even though you know it is best), so you want the language to require it. I found this an odd design suggestion, especially since there are plenty of programmer that wish more flexible type systems. I wonder how you defend your choice to a programmer forced to use your language, but who doesn't want (forced) static typing, in a language where it could have been optional. You tell him that there was no reason not to make it optional, except you want to use it and you don't have the willpower to use it if you aren't forced to? That's a very odd defense.
3. You said, programming without typing is like writing essays without adjectives. I think the analogy doesn't hold at all, because it doesn't make much sense to have a "perfect essay, only missing the adjectives", but it makes perfect sense to have a "perfect program, yet without (complete static) type information".
Perhaps you can respond to these points.
By Anonymous, at 9:00 PM
 Anonymous, at 9:00 PM
	   
It doesn't do any good to catch the empty-array-vs-null error that was mentioned by throwing a runtime error. Once the software has been deployed, there's really not much difference between that and just dumping core.
Doesn't anybody do alpha testing anymore? In alpha testing, or even beta testing, there would be a HUGE difference between an intelligible runtime error and a core dump (and considerable difference from dereferencing through the null pointer, too).
Sure, if the software has already been deployed on a space shuttle that is already in orbit, then it doesn't matter how easy it is to trace the causes of the bug. But if it's caught before that point, then the process of finding the bug, patching it and verifying the correctness of the patch is *greatly* expedited by more information about the precise nature of the bug.
I do think, though, that in many cases even if the compiler *warns* that something is possibly not type-safe, it should allow the user to run the program anyway if he/she so chooses. That seems to me to have the best of both worlds - as long as the programmer doesn't get into the habit of ignoring the warnings even at the finished-product stage.
P.S. Isn't this exact issue why references were invented? References, by definition, cannot be null - they *must* have a referent which is valid in the context of the caller. This makes them immune to certain Stupid Pointer Tricks. If the library programmer had used a reference instead of a pointer, this would have been caught at compile time.
By Anonymous, at 11:42 PM
 Anonymous, at 11:42 PM
	   
You asked why the error happened. It had nothinga to do with type theory. The error happened because it's not a good idea to pass null in production code. It's an invitation to error. Use the null object pattern.
By Anonymous, at 12:12 AM
 Anonymous, at 12:12 AM
	   
I note that junior programmers typically solve compile time type erros by converting to the expected type, which typically manages only to hide the actual bug. Bug hiding is not a desirable aspect of any programming language, but it can become an unintended feature.
By Anonymous, at 2:11 PM
 Anonymous, at 2:11 PM
	   
anonymous:
The so-called "null object" pattern is no solution to the actual problem in this case. In fact, patterns like the null-object pattern are arguably used to patch over areas where the programming language (or the type system) lack adequate facilities.
In a language like C++, where the language doesn't catch things like null pointers, so that they trigger a segfault (or whatever the windows equivalent of segfault is), then using a "real" value as a designated null allows you to "pull up" from the primitive error to an exception.
If you're using a language like Java (which we were in the problem I described) where using null triggers a checked exception, then a designed null value is no different from a primitive null. In either case, you've got a designated value for saying "I don't want to pass a real value"; and in either case, if someone passes the designated null to a place that doesn't want a null, you're going to produce an exception.
And no matter what we chose as a designated null for the object type, the problem that we had was actually different: what do you pass for a *collection* of objects when you want to say "empty collection"? Basically, you need to designate a null value for the collection type. If you've got a language which does null checking, then it's reasonable to use "null" as the designated null for collections; it's also reasonable to say "empty array" for the designated null.
What I argue is a problem is that there's no way, in Java, for the declaration itself to say what it expects. I get a method "void meth(Foo[] fs)". There's no way to tell what that method expects for a null collection. As a contrast, in OCaml, my favorite typed language, there are two ways you could write the type:
meth : foo Array -> unit
meth : foo Array Opt -> unit
The first says "You must pass a non-null array". The second says "You can pass an array or a null.".
I like being able to distinguish those.
By MarkCC, at 8:35 PM
 MarkCC, at 8:35 PM
	   
chris:
References are a rather ugly C++-specific hack. They do allow something like a declaration that says "no null", but they also have other effects. I find them terribly ugly myself. (I worked on a C++ compiler once. Working out things like what the hell a reference to a instantiation of a template type actually meant, and how we generated code for it was positively astonishingly difficult to figure out. The number of hours I spent poring over the C++ standard with my coworkers, figuring out the precise semantics we needed to implement was a damned nightmare.)
By MarkCC, at 8:38 PM
 MarkCC, at 8:38 PM
	   
don:
You're doing *exactly* what I was complaining about in the original post: creating charicatures of people who like statically typed languages that represent anyone who likes static typing as somewhere between mediocre and incompetent.
Preferring a different approach to how to detect errors (which is what typing comes down to) *is not* unreasonable; it does not imply that the advocate of typed languages "stifles the creativity" of coworkers; it does not imply that the person who prefers typing is incompetent.
In particular, WRT the "laziness" issue; it's a well-documented fact that in virtually any medium-to-large software system, in-code documentation and comments are not updated every time the code is changed. Even things written by the very best developers inevitably slip out of synch. Just look at the SE literature some time; there are dozens of studies about it. (I don't have a citation handle, but for two examples, I'm pretty sure that Gail Murphy did a study of it; and the back when Scott Fahlman was doing Gwydion at CMU, they published something about it.) One of the advantages of typing in this context is that you can't get away with not updating the types; your program doesn't work if you don't.
To be perfectly clear, I'm not advocating that everyone use statically typed languages all the type; I'm not claiming that people who prefer statically typed languages are smarter, or more professional, or better programmers than people who prefer dynamically typed languages. All I'm saying is that I'm sick and tired of the endless caricatures of static-typing advocates. If you prefer dynamic typing, and you're more productive with it: bravo. Go ahead and use them. And since I think that *I* am more productive with a typed language, I'll go ahead and use that. Unless we're working on the same project, there's no reason to argue about it.
By MarkCC, at 8:50 PM
 MarkCC, at 8:50 PM
	   
I'm sorry I missed this, ah, discussion a few days ago. :-) There was a recent thread on Lambda the Ultimate that ran a very similar course; it can be found here.
Notably, most of the discussion hinged around the very blurry lines created by the terms "dynamic" and "static" typing. It was suggested that maybe the former term should deprecated in favour of "tags" or something, since it has very little to do with Type Theory.
Always a hot topic, but I'm with you on this one Mark. When I'm coding for myself (which is the only kind I do at the moment) I find it inordinately useful to have the types
foo :: [a] -> [b]
foo :: Maybe [a] -> b
depending on the circumstances. It saves me having to read the functions themselves, or the comments. The former may not be completely obvious and the latter may be wrong or missing - but the types *have* to agree with the function implementation.
By Anonymous, at 2:41 PM
 Anonymous, at 2:41 PM
	   
Mark:
You wrote: "You're [...] creating charicatures of people who like statically typed languages that represent anyone who likes static typing as somewhere between mediocre and incompetent."
Hmm. I may have characterized Bob that way, but I don't think I said anything like that about static typing folks in general. I was mostly addressing Bob's personal arguments, or so I thought.
The only other paragraph of mine that I found was this one: "But the static typing folks go farther. They basically deny that it is EVER useful to postpone this effort. And that is where they err. There are plenty of programming situations where getting (provable!) type consistency is not the most important next step." Perhaps that's what you're referring to.
But then let me ask you straight out: do you agree that there are significant non-trivial times in programming when running code that cannot be automatically proven type-safe could be of value? Perhaps for every programmer some of the time, or else for some programmers most of the time (but not often/ever for others, such as yourself)?
If so, then why not agree on my proposed "compromise" of optional type information with optional type-checking? So the type-checking is "merely" one tool in the programming system toolbox among many, along with profilers, etc.
Is the the elevation of type-checking to an all-the-time requirement that I object most to. I think that does stifle certain aspects of programming for certain programmers.
You wrote: "in-code documentation and comments are not updated every time the code is changed."
Yes, I agree with you completely. Perhaps even more strongly than you mean yourself! My background is AI and inference and structured data, and I'm completely on board with knowledge engineering in general, and getting as much information in machine-understandable form as possible (and then doing inference on it). I agree with you that static typing information in general, and function argument signatures in specific, is an excellent example of such information.
The only disagreement is whether a productive programming language should require a fully (provable!) type-safe program before running it. Whether a compile-time type error is a warning, or else a compiler error that prevents running the specified code.
What is your justification for requiring it for all programmers, in all programming projects?
You wrote: "I'm not advocating that everyone use statically typed languages all the [time] If you prefer dynamic typing, and you're more productive with it: bravo. Go ahead and use them. And since I think that *I* am more productive with a typed language, I'll go ahead and use that. Unless we're working on the same project, there's no reason to argue about it."
But that's part of the trick, isn't it? Rarely is a language choice so freely made on technical merits, and up to personal whim. The company chooses it, or the project, or your collaborators are using something, etc.
I had thought your original post was something stronger: basically, that static typing was a Good Thing, and that a programming language which offered it was a "better" programming language than one that didn't.
Yet nowhere do you seem to acknowledge (or understand?) the benefits of dynamic typing. At best you leave it to personal preference, that perhaps some odd programmers might like it for themselves, but you always prefer static typing in your programming language.
I have a stronger opinion of dynamic typing, but for the moment my main thesis is just: wouldn't a language with optional type safety be the best of both worlds?
(This is similar to a Scheme/Common Lisp kind of debate. Scheme is made by people who prefer a certain style to programming, e.g. functional. Common Lisp is far more agnostic about programming style, and offers an effective language for those who want to use a functional style, or imperative style, or object-oriented style, or... It seems to me that all these programming styles have tradeoffs, and the best programming language is one that lets a master programmer use the most appropriate style for the given coding situation.)
By Anonymous, at 11:08 PM
 Anonymous, at 11:08 PM
	   
bob:
How many times do I need to say that I appreciate dynamic typing? I do, it's wonderful for certain kinds of programming, and there's nothing wrong with people who like to use it, and there's nothing wrong with using it if that's you preference. I've never said otherwise; from the original post onwards, I've just been arguing against the strawman of "anyone who likes static typing is a mediocre developer at best".
Most of the time - not all of the time, but most of the time - dynamic typing is not my preference.
And no, I do *not* agree that optional typing is a good compromise. There are two problems with the hybrid approach:
(1) The places where getting the types right is hardest is often the place where it's most valuable. The value of static typing comes from the way that it demands a kind of clarity about what's going on. In my experience, when you make the typing optional, what you end up with is type declarations in places where you'd be find without them; and no declarations in the places where they'd have the most value.
(2) The more powerful, expressive type systems - that is, the ones that are least burdensome for the developer - infer types from complex webs of inferred constraints around the program. If there are significant parts of the system that cannot have types inferred, then that tends to propagate outward through the system.
By MarkCC, at 8:35 AM
 MarkCC, at 8:35 AM
	   
Mark,
"I do *not* agree that optional typing is a good compromise."
Interesting. Yet you agree that for some programming problems, dynamic typing is a great methodology, and for other problems static typing is preferred.
So this seems to imply that you do not believe, even in theory, that it's possible to have a single programming language that is an excellent tool for all/most programming problems. Instead, it appears you believe that a programmer MUST learn different programming languages for different tasks. Not for social or context reasons, but actually for technical reasons, that you want totally different language designs for different kinds of problems.
Sort of like, for poetry you need French but for science you need to speak German. Couldn't have a single language good for both purposes, even in theory.
I couldn't disagree more strongly. (I think programming is a conversation between people and machines, about algorithms, and there's no reason in principle why a single well-designed programming language couldn't be good at it all.)
But I recognize that this is a topic where my opinion is not well supported. Very tough to get hard data about one programming language being "better" than another, even for a single task, much less this kind of meta-topic about fitness across different programming tasks.
By Anonymous, at 6:46 PM
 Anonymous, at 6:46 PM
	   
don:
Yup, dead on. I absolutely believe in different languages for different tasks. And not just because of something as relatively minor as typing.
I view a programming language as a tool; and different languages as tools for different tasks.
Just to give you a quick example of what I mean, here's a quick list of languages I use routinely, and why:
- Squeak smalltalk, for UU prototyping. Whipping together a UI prototype in Squeak is better than any other tool I know of for two reasons: the deep OO means that you can put "intelligence" wherever you want to; and that's been used to build a direct manipulation UI building tool that is simply magnificent for prototyping UIs.
- Haskell. I've been writing my own little text formatting system. The lazy functional stuff in Haskell, along with the way it handles strings as lists makes it wonderful for that.
- Java. For work. The project that I'm working on is actually very well suited to java. It's distributed, requires high portability, and has aspects that are very OO, and aspects that really aren't. Java fits the bill nicely.
- Prolog. A lot of problems can very naturally be written in terms of search; and I know of nothing that beats prolog for search.
- OCaml. My default language for personal hacking. Incredibly expressive types, the greatest debugger in the known universe, and a wonderful module system.
- Scheme. When I want to whip together a quick toy to experiment with a micro-language (like the Minsky machine I posted recently), I pull out DrScheme.
I wouldn't want to give any of them up. OCaml would be lousy for UI prototyping; Prolog would be a nightmare for writing a text formatting system; Haskell would be terrible for what I do at work. Each is great at what I use it for. None of them is good at everything.
Any finally, I could not possibly disagree more that programming languages are for communicating with machines. Perhaps it's my perspective as a researcher who studies how people build things together - but I view programming languages as tools for building machines, and communicating ideas. And that's closely related to why I like all of those different languages: each one expresses certain ideas in a very clear way.
By MarkCC, at 8:48 AM
 MarkCC, at 8:48 AM
	   
bob:
Sorry. I'm bad with names as the best of times; I confused the two three-letter names in the discussion. I mean don, not you.
-Mark
By MarkCC, at 8:50 AM
 MarkCC, at 8:50 AM
	   
Bob & Mark,
The format of this blog probably makes it difficult to continue this kind of extended discussion. I'm not sure how many people are going to even notice that comments continue on this thread. So perhaps we might as well wrap up.
Nonetheless, I'll try to respond to your latest points, while realizing that perhaps further followups on this topic are becoming irrelevant.
Bob wrote: "'ve never seen an untyped or dynamically-typed language that I would consider using for serious programming, that I didn't consider to be a toy language."
I'll just note that Common Lisp, for one, is a dynamically-typed language that many experienced programmers have already used for serious programming. I don't think any reasonable definition of "toy" would result in CL being a toy programming language.
Bob wrote: "If we'd coded it in something like Lisp instead of (early) Ada, it would have long since been recoded."
Possibly true, but for social reasons, not technical ones. There is plenty of Lisp code still running today 20 years after being written. Lisp code from 20 years ago is not suddenly obsolete because of some new theory of programming languages.
Bob wrote: "Programming is primarily communication among people and only incidentally between people and machines."
I don't disagree, but I think you minimize the machine too much. When two experienced programmers are discussing algorithms as a conversation between humans, they use natural language, context, and intelligence in a way that is far beyond the capabilities of machines. Programming languages are a clear attempt to bridge the gap between human and machine. If you leave the machine out of it, perhaps "programming" would look more like academic papers in programming language journals. Or human email. Or IM chats.
Mark wrote: "I view a programming language as a tool; and different languages as tools for different tasks."
It's an understandable perspective, but abstract tools like programming languages aren't limited by the same constraints as physical tools. It's hard to make a good hammer and screwdriver combined (although swiss army knives try). And I agree that optimizing in one aspect might degrade utility in another, in theory.
I think it's still an open question whether multiple programming paradigms could co-exist peacefully within the same language. (Common Lisp is one experiment on the pro side.)
Note that my real claim is not that some projects are best with one tool than another; it's that, within a single project, portions of the coding might be best addressed with one style vs. another.
Your examples, which I don't disagree with, don't strike me as mutually incompatible. What happens if you're writing some search code (using Prolog), and you decide you want a UI interface also (in Squeak?)? Surely it's a big effort to implement a single program in multiple languages, and then use ad hoc methods to glue them together.
Why do you see it as so impossible to design a language which permits each of these styles, and use the appropriate style for the appropriate pieces of code?
(I note also that some of your preferences are really social [Java] or libraries [UI], not so much fundamental language design.)
By Anonymous, at 8:22 PM
 Anonymous, at 8:22 PM
	   
Post a Comment
<< Home