Monday, May 26, 2008

A Theory of Scheduling Low Priority Work

PC-Doctor delivers an enormous number of different products to different customers. Each customer gets a different product, and they get frequent updates to that product as well. Delivering these products requires complex synchronization between dozens of engineers. We've gotten great at scheduling the most important work. Our clients love us for that.

However, the low priority projects get released significantly less reliably. Until recently, I'd assumed that this problem was unique to PC-Doctor. Based on some extremely sketchy evidence from another company, I'm going to release my Theory Of Scheduling Low priOrity Work (TOSLOW).
Let's suppose that we've got a project (L) that is not as important as a set of other projects (H). Here at PC-Doctor, we like to deliver our important projects on time. In order to do that, we often have to drop what we're doing to get something done on a project that needs work now. That means that someone who's in the critical path for completing a project in H will not be able to do any work on L. Things may be somewhat more extreme here at PC-Doctor than they are in a typical company, but I suspect that L will always have trouble causing a delay in H.

Now, it's possible to get work done on L. For example, we could hire someone just to work on a specific project in hopes that it will, in time, start making money. That'd be great. You'd manage to get 100% of a small number of people's time for your low priority project. The trick is that the people working on L do not have any responsibilities that are needed by the high priority projects so they can be scheduled independently of H.

Until discovering TOSLOW, I'd assumed that this would mean that, eventually, the project would reach completion. The people working on L might not be perfect for each task that needs to be accomplished, but they can do each task to some extent. They're devoting all their time and energy on that project, so eventually they'll learn what they need and get to the next step.

If that assumption is correct, then L will get accomplished. Furthermore, it is likely that L can even be scheduled accurately. I've never seen this happen here at PC-Doctor, though.

Here's why. If a project requires interactions with a large number of systems that are being used by the projects in H, then the person working on the low priority project will have to get some resources from the person in charge of each of those systems.

There's a chance that those resources will be obtainable. The low priority project's schedule will be determined by the least obtainable resource. In principle, you'll always get the resource you need eventually. If one person is always the the rate limiting step for H then something should be changed to improve the scheduling of H.

However, even if we can say that L will eventually complete, if we want to schedule L accurately, we will have to be able to predict when we'll get time from each resource being used by H. In order to make this prediction, you'll have to understand the schedule of H. Project L will have to wait until resources being used by H are available. Here at PC-Doctor, this is particularly bad. Engineers working on our main projects tend to work closely together. That means that a large number of them are working on the critical path. In other words, getting an engineer to work on L requires H to be unable to use that engineer. Perfect scheduling is not possible, so this happens frequently. However, this means that L's schedule is coupled to the errors in H's schedule!

It's possible that I'm a project scheduling wimp, but I'm going to claim that, as long as the schedule of L is tightly coupled to errors in an unrelated project's schedule, then you shouldn't even try to schedule L. In the worst case, you should just say that the project will eventually reach completion, but you have no idea when it will be.

TOSLOW can be summarized in this way: If a project is low enough priority that it cannot preempt another project's resources, but it still requires some of those resources, then the error in the low priority project's schedule is going to be large.

An important corrollary to TOSLOW is that low priority projects will always be late. Errors in scheduling almost always cause things to be late rather than early!

Okay... If you're working on a low priority project like the ones that I've described, then I haven't really helped you. All I've done is give your boss a reason to kill your project. :-( How can you avoid the effects of TOSLOW?

Just being aware of the problem will put you ahead of where I was on a low priority project that I've done for PC-Doctor. If you're aware of it, then you can start work on the stuff that requires interaction with higher priority projects immediately. In fact, I'd say that, as long as you've proven to yourself that your project is somewhat possible, you should spend all of your time on the interactions. After all, pretty soon you'll be waiting for resources and can spend some time on the meat of your project.

Recognizing the problem helps in another way as well. If your boss is aware of TOSLOW when the project starts, then you may be able to get your project's priority temporarily raised when it needs some help. This is exactly what the Windows kernel does to avoid thread starvation. (The reason for this is actually to avoid priority inversion, but it's got nice side effects for low priority threads as well.) If a thread doesn't get scheduled for a while, then its priority will get a limited boost. That's what you need to ensure your project doesn't end up waiting indefinitely like this one.

This post was originally published on the PC-Doctor blog.

No comments: