Wednesday, August 8, 2007

Multithreaded Programming for the Masses

Writing software on multicore CPUs is a hard problem. The chip designers have told us that they're not going to do all the work for us programmers anymore. Now we have to do something. (Here's a good description of the problem from Herb Sutter.) Writing multithreaded apps is not easy. I've done a lot of it in C++, and the tools, the libraries, and the design patterns just don't make it trivial. I don't mean to say that it's impossible, however. It's certainly possible if you're careful, you plan for it from the beginning, and you know what you're doing. The real problem is that 99% of programmers aren't all that good. A defect in multithreaded code can be really hard to track down, so it's expensive if you get it wrong. A lot of companies and research groups have been spending a lot of time trying to figure out how to make it easier. I haven't been doing any research on the topic, but, like a good blogger, I'm going to comment on it anyway.

First of all, do we have to speed up all applications by the number of cores in your CPU? Your typical Visual Basic programmer isn't any good at threading, but do they have to be? They can create dialog boxes and display data and let you edit that data. Do they have to know anything about threads for this?

Probably not.

It's still possible to speed up that dialog box, too. If there's some text entered by the user, the grammar/spelling checker might be running on a different core from the UI widget. This wasn't done by the Visual Basic programmer. The dialog might have been drawn using a graphics engine that used threads or even offloaded some computations to the video card. Again, this wasn't done by the programmer who's never worried about threads.

So, we don't always have to know about threads. Some substantial fraction of the programmers in the world don't have to speed up their code using multiple cores. That's good news. We really need to admit that not all programmers need to make their programs faster. The libraries they use can be made faster without compromising the reliability of the program itself.

That's not the whole story, though. Let's look at the other end of the spectrum. How about a video game?

Graphical processing is astonishingly easy to parallelize. GPUs will continue to get faster over time because of this. A lot of calculations can be done on individual pixels. It's relatively rare that two pixels that are far apart will affect each other substantially. (There are some techniques that defy this statement, but a lot of them rely heavily on expensive matrix calculations which may be parallelizable. I know nothing about these techniques, but I'm going to guess that they're easy to parallelize, too.)

If video cards are going to continue to get faster, then the CPU had better keep up. The CPU is going to have to generate enough data to avoid starving the GPU. As the GPU gets infinitely fast compared to a single core of the CPU, this will require multiple cores.

Uh, oh. Does this mean that all game designers will need to use multiple threads?

I'm sure that it doesn't. Couldn't the game engine programmers do all of the hard work and let the average game programmer use a safe API that avoids threads? Certainly a level designer who's scripting an in-game event in Lua would not be forced to worry about threads!

What about an AI programmer? I don't know enough about game engine design to say for sure, but I'd be willing to bet that a cleverly designed game engine could avoid exposing a lot of the multithreaded mess to many programmers, but at some point AI programmers will have to do things on multiple cores at the same time.

While the AI programmer might be fairly smart, and they might be trained in multithreaded programming techniques, that does not mean that it's a good idea to subject him or her to mutexes, condition variables, thread creation, deadlock avoidance, etc. That's not a good use of a programmer's time.

What can be done to help a programmer who really does have to run code on multiple cores?

Functional programming languages are a potential solution. The creators of languages like ML, Haskell, and Erlang realized that your compiler can do a lot of fancy things if the programmer isn't ever allowed to modify a value. If things can't be modified, then they can be easily parallelized. Once you create an object, as many cores as you want can read it without even bothering to lock it. Of course, this will frustrate a Visual Basic programmer who is used to changing a variable as the object that the variable represents changes. It requires some significantly different programming techniques.

Once again, this is not for everyone.

Futures are a pretty slick way to abstract away threads and some locks. It doesn't eliminate the headaches of multithreaded programming, but it has the potential to make things simpler.

A future is a chunk of code that a programmer wants to execute. Instead of calling the function that contains the code and waiting for the return value immediately afterwards, the programmer separates the calling and the waiting. First you tell your library that you want your future to run. The library can do whatever it wants at this point. It could execute it immediately. It could put it in a queue to be run by a thread pool. It could ignore it completely. The program can then do whatever it wants. At some point, it may want the result of the future. Then the program will ask the library for the result of the future. The library may have to wait for the result to be ready, but eventually the result will be returned.

The great part about futures is that threads don't have to be created. If you can divide your program's logic into N bits, you can let your library and compiler figure out how to use that to speed up the program using multiple cores.

Unfortunately, futures don't eliminate the need to worry about locks. Since that's probably a more difficult problem than thread management, futures are not a panacea.

There are some other ways to easily parallelize your code. SQL statements are a great example. Databases are really good at optimizing SQL for this sort of thing.

Intel has a strong interest in making it easier for programmers to use their new chips. Their Thread Building Blocks library uses STL-style algorithms and protected iterator ranges to express parallelism. It also has a "task scheduler" that behaves at least a bit more like a futures library. This seems like a really great idea. An STL-style algorithm that describes the parallelism to the library is not always sufficiently flexible to describe a problem. However, if it is sufficient, it's extremely easy to use. The task scheduler is more conventional and would probably be much nicer with a good lambda library.

OpenMP is another library designed to abstract away many of the details of parallizing code. It's not strictly a library, however. It requires some compiler support as well. The programmer uses a function that behaves much like a Unix fork() command. OpenMP then manages several threads to handle the different branches of the fork. While I'm certainly not an expert, this doesn't seem like either a clean solution or a sufficiently flexible solution.

I'm sure there are other research projects out there. If you know of any interesting ones, please post a comment about it below.

This originally appeared on PC-Doctor's blog.

No comments: