Friday, October 5, 2007

Fairness in Win32 Lock Objects

Windows Vista has given up fairness in their synchronization objects. In fact, Windows XP isn't guaranteed to be fair, but Vista goes quite a bit further.

The issue and the solution is explained pretty well by Joe Duffy over here. At first glance, it looks as though this could cause some really bad thread starvation in some circumstances.
The change to unfair locks clearly has the risk of leading to starvation. But, statistically speaking, timing in concurrent systems tends to be so volatile that each thread will eventually get its turn to run, probabilistically speaking. Many more programs would suffer from the convoy problems resulting from fair locks than would notice starvation happening in production systems as a result of unfair locks.
The problem they're trying to solve and the problem that I'm worried about are pretty similar, so it's worth explaining it carefully. If you have a mutex that protects a small chunk of code, and you have a large number of threads trying to get that mutex, then you have the potential to have a lock convoy. You can get this easily if you wake up a large number of threads with the same priority and have them all immediately try to get another lock.

Here's what happens. A bunch of threads are waiting for a mutex. A thread releases the mutex. Under Windows XP, the next thread in line to get the mutex now owns the mutex and gets a temporary priority boost. Windows does a context switch to the new thread, and it performs a short operation and releases the mutex. (Note that this will also boost the priority of the next thread in line, but that thread will have the same priority, so no context switch is performed.)

These context switches are expensive. Joe Duffy claims that it's about 1000 cycles. Avoiding these switches would be a good thing, so Vista changes the way mutexes work. When a thread releases a mutex under Vista, the mutex does not become owned by the next thread in the queue. Instead, it becomes unowned. This allows a third thread that's already in the running state to grab the mutex and keep going without any context switches at all. That's much cheaper, much less fair, and has the potential to starve the waiting threads.

One problem with this is that you're giving priority to running threads over waiting threads. As the number of cores in your CPU goes up, then the number of running threads gets higher. Eventually, isn't there going to be some starvation of the waiting threads?

I tried for a while to construct an artificial scenario where a thread could be starved for a long period of time. I've got to admit that I couldn't do it. The reason is that there is always some randomness in the scheduling. In order to make it so that there is a problem, I had to increase the fraction of time that a thread holds the synchronization object. Once you do that, you're going to limit the number of threads that can run without having to run into problems with a lack of fairness. It feels as though that might be true of all problems where this change becomes bad, but I couldn't prove it.

In short, Vista gives up the pretense of fairness in synchronization objects, and it does so without guaranteeing that it doesn't starve a thread. However, it looks to me as though thread starvation will never be a huge issue.

This originally appeared on PC-Doctor's blog.

No comments: