Monday, March 10, 2008

Static Analysis of Untyped Languages

A while ago I wrote about whether or not untyped languages were a good idea or a bad idea. I didn't come to any real conclusions at the time, and it's bothered me. I'd like to outline a way to gather some real conclusions now. I still won't be able to come to a conclusion, but I think this approach sounds interesting.

There are a number of advantages and disadvantages to untyped languages. One of the disadvantages is that, because variables have no type, you can assign the wrong type to them. A strongly typed language can avoid most of this problem. (You might still have problems involving type conversion, but a good programmer or a strict language can avoid those problems.)
I'd like to propose a tool that can analyze some untyped code without running it. (This is called static analysis.) The tool will try to deduce the types of variables. In theory, it is possible to analyze what possible types a value has at any given time. We couldn't be as strict as a typed language, but we can make some conclusions. For example, let's look at the following Python code:

class Duck:
  def _init_(self):
    self.hungerLevel = 0

daffy = Duck()

The first few lines define a type with a single numeric member variable. We can tell this before we run the program. The last two lines create a Duck and pass it to the feed function. We can tell at compile time that daffy is a variable of type Duck and feed is a function that will receive Ducks. Functions may receive different types at different times, but we can record all of the possible inputs and make sure the types are compatible at each point.

This cannot work perfectly, of course. For example, there might be code paths that are simply impossible to go through because of the value of variables rather than the type of variables. Determining all of these cases cannot be done without running the program. (My theoretical computer science is pretty weak, but I vaguely remember some theorems that could prove that.) However, if there's a program out there that is type safe but requires some of the code paths to be impossible to reach, it's probably not a particularly well written program.

I'd like to write something like this, but haven't fully justified it, yet. If I had such a tool, I'd try to run it on a body of existing, open source, untyped code.

I can imagine two possible ways to use this to strongly suggest that untyped languages are dangerous despite the claims of untyped language fans. First, real problems could be found in the code base. If the author(s) of the untyped code would admit that problems were there, then, almost by definition, it seems as though the problems exist. Second, the author(s) could become of a fan of the tool and use it during development. Perhaps the final code doesn't have type safety problems, but the code has those problems in its early stages. This could also be used as evidence that there's a problem with untyped languages.

Unfortunately, it's hard to imagine how this experiment could prove that untyped languages don't have a problem. Furthermore, it's possible that a tool like this works well enough that the problem largely goes away after programmers start using it. (The latter seems unlikely, but it's certainly a potential complication.)

Don't expect rapid progress here, but I'd guess that, at the very least, I'll have some follow up posts about this project as it progresses.

This originally appeared in PC-Doctor's blog.

No comments: