Monday, May 26, 2008

A Theory of Scheduling Low Priority Work

PC-Doctor delivers an enormous number of different products to different customers. Each customer gets a different product, and they get frequent updates to that product as well. Delivering these products requires complex synchronization between dozens of engineers. We've gotten great at scheduling the most important work. Our clients love us for that.

However, the low priority projects get released significantly less reliably. Until recently, I'd assumed that this problem was unique to PC-Doctor. Based on some extremely sketchy evidence from another company, I'm going to release my Theory Of Scheduling Low priOrity Work (TOSLOW).
Let's suppose that we've got a project (L) that is not as important as a set of other projects (H). Here at PC-Doctor, we like to deliver our important projects on time. In order to do that, we often have to drop what we're doing to get something done on a project that needs work now. That means that someone who's in the critical path for completing a project in H will not be able to do any work on L. Things may be somewhat more extreme here at PC-Doctor than they are in a typical company, but I suspect that L will always have trouble causing a delay in H.

Now, it's possible to get work done on L. For example, we could hire someone just to work on a specific project in hopes that it will, in time, start making money. That'd be great. You'd manage to get 100% of a small number of people's time for your low priority project. The trick is that the people working on L do not have any responsibilities that are needed by the high priority projects so they can be scheduled independently of H.

Until discovering TOSLOW, I'd assumed that this would mean that, eventually, the project would reach completion. The people working on L might not be perfect for each task that needs to be accomplished, but they can do each task to some extent. They're devoting all their time and energy on that project, so eventually they'll learn what they need and get to the next step.

If that assumption is correct, then L will get accomplished. Furthermore, it is likely that L can even be scheduled accurately. I've never seen this happen here at PC-Doctor, though.

Here's why. If a project requires interactions with a large number of systems that are being used by the projects in H, then the person working on the low priority project will have to get some resources from the person in charge of each of those systems.

There's a chance that those resources will be obtainable. The low priority project's schedule will be determined by the least obtainable resource. In principle, you'll always get the resource you need eventually. If one person is always the the rate limiting step for H then something should be changed to improve the scheduling of H.

However, even if we can say that L will eventually complete, if we want to schedule L accurately, we will have to be able to predict when we'll get time from each resource being used by H. In order to make this prediction, you'll have to understand the schedule of H. Project L will have to wait until resources being used by H are available. Here at PC-Doctor, this is particularly bad. Engineers working on our main projects tend to work closely together. That means that a large number of them are working on the critical path. In other words, getting an engineer to work on L requires H to be unable to use that engineer. Perfect scheduling is not possible, so this happens frequently. However, this means that L's schedule is coupled to the errors in H's schedule!

It's possible that I'm a project scheduling wimp, but I'm going to claim that, as long as the schedule of L is tightly coupled to errors in an unrelated project's schedule, then you shouldn't even try to schedule L. In the worst case, you should just say that the project will eventually reach completion, but you have no idea when it will be.

TOSLOW can be summarized in this way: If a project is low enough priority that it cannot preempt another project's resources, but it still requires some of those resources, then the error in the low priority project's schedule is going to be large.

An important corrollary to TOSLOW is that low priority projects will always be late. Errors in scheduling almost always cause things to be late rather than early!

Okay... If you're working on a low priority project like the ones that I've described, then I haven't really helped you. All I've done is give your boss a reason to kill your project. :-( How can you avoid the effects of TOSLOW?

Just being aware of the problem will put you ahead of where I was on a low priority project that I've done for PC-Doctor. If you're aware of it, then you can start work on the stuff that requires interaction with higher priority projects immediately. In fact, I'd say that, as long as you've proven to yourself that your project is somewhat possible, you should spend all of your time on the interactions. After all, pretty soon you'll be waiting for resources and can spend some time on the meat of your project.

Recognizing the problem helps in another way as well. If your boss is aware of TOSLOW when the project starts, then you may be able to get your project's priority temporarily raised when it needs some help. This is exactly what the Windows kernel does to avoid thread starvation. (The reason for this is actually to avoid priority inversion, but it's got nice side effects for low priority threads as well.) If a thread doesn't get scheduled for a while, then its priority will get a limited boost. That's what you need to ensure your project doesn't end up waiting indefinitely like this one.

This post was originally published on the PC-Doctor blog.

Wednesday, May 21, 2008

Making Regexes Readable

Regular expressions are extremely powerful. They have a tendency, however, to grow and turn into unreadable messes. What have people done to try to tame them?

Perl is often on the forefront of regex technology. It allows multiline regexes with ignored whitespace and comments. That's nice, and it's a great step in the right direction. If your regex grows much more than that example, then you'll still have a mess.

What is it that makes large programs readable? More than anything, subroutines do it. I really want to be able to create something analogous to subroutines in my regex. I'd like to be able to create a small, understandable regex that defines part of a complicated regex. Then I'd like the complex regex to be able to refer to the smaller one by name.

Once again, we can look at Perl. Well, we can almost look to Perl. Perl allows you to something called an overloaded constant. It looks as though these can define things like a new escape sequence that's usable in a regex. I won't claim that I understand it, but this page talks about it some. It seems to do the right thing, but I can't find many people who use it, so it must have problems. I'm going to guess that the scope of the new escape sequence is visible to all regular expressions. That would make it awkward to use safely.

Python, Ruby, and .NET don't have the features that I'd want. They tend to have fairly conventional regex libraries, however. It looks as though I'll have to look elsewhere.

Boost.Xpressive takes a completely different approach to regular expressions. This is an impressive C++ expression template library written by Eric Neibler. It allows you to create conventional regexes. It also allows a completely different approach, however.

This approach goes a long ways towards making complex regexes readable, but it's not without problems.

Here's an example: /\$\d+\.\d\d/ is a Ruby regular expression to match dollar amounts such as "$3.12". It's a very simple regex, and a static xpressive regex gets a lot more verbose:

sregex dollars = '$' >> +_d >> '.' >> _d >> _d;

Remember, this is C++. A lot of the operators that conventional regexes use aren't available. For example, a prefix + operator is used instead of postfix one. C++ also has no whitespace operator. >> takes the place of this. The result is a fairly messy syntax.

However, you can do some really great things with this. You can, for example, use a regex inside another regex.

sregex yen = +_d >> '¥';
sregex currency = dollars | yen;

You can start to see that, while simple regexes are worse looking, the ability to combine individual, named regexes together allows complex regexes to look much cleaner.

I'm not convinced that Boost.Xpressive is the answer. C++'s limitations show through the library's API too easily. However, if I ever have to create an extremely complex regex that will require extensive maintenance later, I'm unaware of any viable alternatives.

Ideally, some other language will take this idea and make it cleaner.

This post was originally published on the PC-Doctor blog.

Monday, May 12, 2008

Anonymous Methods in C# Cause Subtle Programming Errors.

Lambda expressions and anonymous methods in C# are more complicated than you probably think. Microsoft points out that an incomplete understanding of them can result in "subtle programming errors". After running into exactly that, I'd agree. While I haven't tried it, Lambda expressions in C# 3 are supposed to do exactly the same thing.

Here's some code that didn't do what I'd originally thought it would do:

class Program {
  delegate void TestDelegate();

  static void Test( int v ) {
    System.Console.WriteLine(v.ToString());
  }

  static TestDelegate CreateDelegate() {
    int test = 0;
    TestDelegate a = delegate(){ Test(test); };
    test = 2;
    return a;
  }

  static void Main() {
    CreateDelegate()();
  }
}

This prints 2. This is not because of boxing. In fact, exactly the same thing happens if you replace the int with a reference type.

Instead, the compiler creates an anonymous inner class and moves all of the captured variables into that. All references end up pointing to the inner class, so the second assignment to the test variable actually modifies a member of this class.

Here's roughly what it looks like:

class Program
{
  delegate void TestDelegate();

  static void Test( int v ) {
    System.Console.WriteLine(v.ToString());
  }

  class __AnonymousClass {
    int test;
    void __AnonymousMethod() { Test(this); }
  }

  static TestDelegate CreateDelegate() {
    __AnonymousClass __local = new __AnonymousClass();
    __local.test = 0;
    TestDelegate a = __local.__AnonymousMethod;
    __local.test = 2;

    return a;
  }

  static void Main() {
    CreateDelegate()();
  }
}

Anything starting with a few underscores is a compiler generated name. The names I used are not correct.

Here's the catch. The local variable no longer exists. The variable you thought was local is now located inside an object created to hold your anonymous method.

Interestingly, while this is the whole story as documented by Microsoft, there is more to it. For example, it's possible to have two anonymous methods that reference the same local variable. It looks as though that variable is shared between the two anonymous method objects, but someone who's willing to disassemble the compiled code should confirm that.

You really do have to know about some of this behavior. The problem would disappear if anonymous methods could only read local variables. Then a copy of the value could be stored.

This post was originally published on the PC-Doctor blog.

Monday, May 5, 2008

Developing a New Framework

This post is a bit of a change for me. I'm actually going to write about my work for PC-Doctor! I'm actually a bit embarrassed at how rare that's been.

I want to talk about how to design a brand new framework. It's not something that everyone has to do, and it's not something that anyone does frequently. However, there's very little information on the web about the differences between creating a library and a framework.
I've been working on a framework here at PC-Doctor, and I've worked on a few others in a previous job. I'll admit that I'm assuming that all similarities between these projects will be true for any new framework.

There are three things that I want to see from a new framework project. After I get those, it becomes more difficult to generalize across framework projects. The first framework I developed had all three of these aspects mostly by accident. My current framework has a tighter deadline than the previous one, but it still has all three of these to some extent. I'm a strong believer in all of them.

Goals


All of the frameworks that I've created try to do something new. This makes requirements gathering extremely difficult. If no one understands what the framework will accomplish, you almost have to decide on your own what it'll accomplish. You'd better be right, though!

Management will typically have a short term goal that they want accomplished with the framework. They might even be able to come up with a set of requirements. Don't get trapped by this. A framework must be much more than that. It enables programmers to use a common language to describe a class of application. Learning this language requires a significant time investment. If it only solves a short term goal, programmers won't be able to make that investment. Your framework has to solve a significant problem in order to be worth the investment required.

In another sense, however, requirements gathering for a framework is easy. You can get some details of the requirements wrong without damaging your users' ability to figure out what they want to do with it. After all, they typically write code using the framework. They can just write a bit more code than they should and get things working. Later iterations can use this experience to refine the design, but this won't happen until late in the design of the framework.

Instead of trying to find formal requirements, I prefer to find something that I call "goals". This is closer to a set of use cases than requirements, but they're reformed so that they look like requirements.

After developing a framework in my previous job, I saw how critical these goals were in the framework development. A good set of goals can let you make decisions quickly and accurately about design problems. If the goals are relatively small and simple, then they can be applied uniformly and accurately throughout the life of the project. That means that you're likely to fulfill the goals.

As an example, I'll give you a few of the goals for my current framework:

Goal: Relatively inexperienced developers should be able to use the framework to do somewhat sophisticated things.

This goal has driven a large fraction of my decisions on my current project. In my vision of the future, there will be hundreds of mini-applications using this framework. Having an enormous number of these applications would allow us to do some really amazing things, but that's simply not possible if I have to write them all. In fact, that's not possible if PC-Doctor engineers have to write them all.

This goal is designed to allow us to recruit developers who are more interested in the problems that the framework can solve than the techniques required to write code with it.

If this goal were a requirement, it would state something about the usability of the framework. Perhaps it would say how long a typical programmer would take to develop their first application with it. In its current form, it's almost completely useless to our QA department. That's not what it's for.

Goal: The appearance of the UI elements created with this framework should be directly modifiable by Chris Hill, our art department.

Again, this is me recruiting other people to do my work. :-) Stuff that Chris creates looks about a hundred times better than stuff that I create. Looking good is an important goal for us since we want our product to be fun to use rather than merely possible to use.

This is a better requirement than the previous one. This can be verified directly.

However, it turned out not to be that useful of a goal. While this goal verified some of the design decisions that were made for other reasons, it hasn't been all that useful to me directly. This might be a good thing, actually. If the goals work well together, they shouldn't have to conflict.

Goal: A future iteration of the framework should be portable to a variety of other platforms.

This goal is a good example of a goal that mostly gets ignored. The architecture does indeed support Linux, and a lot of the code should be easily ported to alternate platforms. However, it's hard to pay attention to a goal that isn't needed in the next release. PC-Doctor has some tight deadlines; we don't get to develop frameworks out of sight of our paying customers.

Not all goals have equal importance, and not all goals are actually useful. I don't consider this a failure, yet. Try to have as small a set of goals as possible. The more you have, the more difficult it will be to accomplish all of them simultaneously.

Okay, I've got a few other goals, too, but that gives you the idea. These are extremely high level goals. You could call them requirements, but that would be stretching things a bit. They really aren't that formal.

These goals are extremely important to the project. Choosing the wrong ones can kill the chance of success. Choosing the right ones will make design decisions extremely easy.

Usability


The next thing to worry about is the users of the framework and its usability for those users. I've talked about this before. The things I say in that post are even more valid for a framework than they are for a library. Go ahead and read that. I'll wait.

In my current project, I've got two types of users to think about. The first are mentioned in my goals. These are the developers who will write mini-applications with it. The second group are my coworkers who will help me maintain it.

Unfortunately, this framework has ended up putting the two groups in conflict. I frequently end up adding complexity to the framework in order to reduce the complexity of the API. In fact, I frequently go to great lengths to simplify the API slightly without worrying about adding half a dozen states to the framework's state machine.

It's still too early to tell if this will be a success. However, there are some preliminary indications:

1. Our first client to see the early results of the framework liked it and used it enough to have a lot of feedback for us.

2. Stephen, the product manager in charge of the first product, is currently busy writing a mini-application to test an optical drive. He doesn't complain much anymore. (I need to get him to complain more, actually. He's my only usability testing!)

3. Soumita, the only other programmer to actually dig into the framework so far, complains loudly. While I feel bad for her, making the internals simple wasn't one of my goals. I'm a bit worried now that it should have been, though.

To summarize, the UI of a framework isn't any different from the UI of an application. Use the same techniques to improve it. Above all, take the usability seriously. Frameworks are complicated and require significant investment to begin using them seriously. People will not do that if it's not easy to use.

Tools


You want to make your framework easy to use. You can do that by making a nice, clean API, or you can do it by making tools that allow users to ignore the ugly parts of your API. Both possibilities should be considered.

Boost and XAML are two frameworks that take this principle to opposite extremes. It's worth looking at both.

Boost has a wonderfully clean API. The tools that they've created suck. (Boost.Jam and BoostBook are horrific messes that make me cry.) The framework itself is a joy to use because you don't frequently touch their tools. This is a valid approach to framework design, but it's not the only approach.

Microsoft's XAML, for example, is the complete opposite. XAML is completely unreadable and extraordinarily difficult to use by itself. XAML data files are as readable as object files. However, Microsoft doesn't want you to use it by itself. They created a set of tools that let you completely bypass the XML obfuscation that XAML requires. The tools themselves are clean and easy to use. Again, this is a valid way to approach framework design.

I prefer something in between, though. Make sure there are some tools to help users deal with the worst parts of your framework. At the same time, make the framework itself clean. Solve all aspects of the usability problem using the most effective tool for the problem.

For my current project, I didn't have time to create any tools. However, I did manage to make a lot of the code that users create editable with CSS and XHTML tools. There are a lot of great tools for web development. All I had to do was enable my users to use these. The jury is still out on that decision, but I'm still optimistic.

This originally appeared on PC-Doctor's blog.