Wednesday, July 30, 2008

Fingerprint Readers Don't Work

A while ago, I got annoyed at a friend's computer. It had a fingerprint reader, and I wanted to play a game on it before he woke up.

Fortunately, it turned out that my fingerprint worked just fine. It took a few tries, but I successfully logged in as him.

He did look a bit shocked when he woke up and saw me playing a game on his supposedly secure work computer. Too bad he wasn't in the IT department at his company. :)

How secure are fingerprint readers? I can't say that I'm impressed. Since you leave what is essentially your password on everything you touch, they can't be infallible.

Fingerprint readers are supposed to be intimidating. You're supposed to look at one and think to yourself that you'd have to do some kitchen trickery to defeat it. Intimidation might be most of the security they provide.

That would have worked for me. I don't make a habit of breaking into other people's work computers. Is intimidation all they've got?

It looks as though that might be true. If someone really wants to break in, they can. It's not always as easy as my attempt was, but even the most secure readers can be broken.

However, I'm not going to complain too much about fingerprint readers. It's really easy to login to a computer with one. It took about 5-10 seconds to break into my friend's. Imagine how easy it'd be if it worked the first time? Convenience is much more important than security to me on many of the computers that I use.

Incidentally, there's an interesting ending to this story. The friend whose computer I broke into was a researcher at HP Labs. After seeing me casually playing a game on his computer, he decided to do some research on alternate biometric input devices.

Wednesday, July 23, 2008

High Performance Multithreaded Code

Current CPUs have several cores per CPU. If a program wants to speed up with new hardware, the program has to exploit those extra cores. Using multiple threads is, therefore, becoming extremely popular.

Of course, people who talk a lot about multithreaded programming don't ever mention that most programs don't need to be any faster. While I feel obligated to point that out, this article is written for people who do want their applications to run faster.

In fact, I'm going to go even farther than that. This is for people who want to squeeze every last bit of performance out of their multithreaded code. This isn't for everyone.

Since I don't have a lot of experience with this, I'm going to talk about two books that I've read. They both talk about specific topics that are, I suspect, absolutely essential to some extremely high performance multithreaded code.

Interestingly, neither book talks about what I'm talking about directly. The books have absolutely no overlap, either. They both talk about different ends of the same problem without looking at the whole problem.

I enjoyed both of them.

The first, The Art of Multiprocessor Programming, was written by a couple of academics. It's a highly theoretical look at lock free and wait free data structures. It never talks about real hardware. It's also fascinating.

The second, Code Optimization: Effective Memory Usage, is an extremely practical guide to how modern hardware deals with a critical resource, memory. It talks in detail about what the hardware is doing. It doesn't touch algorithms that avoid the many problems that it talks about. It's a bit out of date as well, but it's still worth spending time with.

The Art of Multiprocessor Programming

You wouldn't be able to tell from the cover or the publisher's description of it, but this book is about lock free and wait free algorithms.

A component is lock free when many threads can access the routines and at least one thread will always make progress. If a thread holds a mutex, this is not possible. The thread with the mutex could page fault and be forced to wait. During that wait, no thread will make progress.

Wait free is an even stronger constraint. A routine is wait free if it will complete in a finite number of steps. That is, all threads will simultaneously make progress.

It's possible for a data structure to have some routines that are wait free and others that are merely lock free. The authors frequently try to make the most critical routines wait free and the less important ones lock free.

Lock free programming is a topic that's always fascinated me. It seems incredibly difficult. Researchers like the authors must agree, because there aren't that many lock free algorithms in the literature, yet. There are a few data structures out there, and a lot of work has been done on critical algorithms like heap management routines. There isn't much else, though.

The book, however, walks you through the techniques that are needed to build these algorithms. They describe and analyze the algorithms in ways that I don't normally bother with. Mathematical proofs appear to be critical to their process. Don't worry too much, though. None of the proofs outlined in the book were difficult to follow. Without the proofs, I would have had a difficult time understanding what they were doing, too.

Here's an example of an interesting theorem in the book. Modern processors have a variety of atomic instructions that are designed to help avoid locks. These instructions are critical to lock free programming. Examples include atomic increment and compare and swap.

Lock free algorithms replace locking a mutex with a number of these atomic instructions. Someone created a theorem that essentially states that most of these instructions are pathetic. (I'm only paraphrasing slightly.) Compare and swap is proven to be useful, however.

Lock free articles talk a lot about compare and swap. It's nice to understand why!

Incidentally, despite the title of my post, lock free algorithms are not necessarily faster than a conventional algorithm wrapped in a lock.

The Art of Multiprocessor Programming doesn't talk about it, but these atomic operations are expensive. They require a memory barrier. This requires communication with all of the other cores in the computer, and it's slow.

Arch, an Intel developer, puts it nicely here:
My opinion is that non-blocking algorithms are important in situations where lock preemption (arising from oversubscription) is an issue. But when oversubscription is not an issue, classic locked algorithms, combined with designing to avoid lock contention, will generally out perform non-blocking algorithms on current Intel hardware. I don't know about other hardware, but observe that:
  1. Non-blocking algorithms generally use more atomic operations than it takes to acquire a lock.
  2. Atomic operations in non-blocking algorithms generally need a memory fence.
  3. Memory fences incur an inherent penalty on out-of-order processors or processors with caches.
Do keep that in mind when you read the book! If your algorithm uses too many of these atomic operations, there's no point in doing it. Locking a mutex doesn't require many of these operations.

The authors act like typical academics and ignore this problem completely. :)

Code Optimization: Effective Memory Usage

This book is dated 2003. It's several processor generations out of date. Don't panic, though. It turns out that a lot of what Kris Kaspersky says has been true for far longer than that.

There's a good chance that some of his discussion of ways to exploit specific CPU generations isn't useful anymore. However, interleaving memory bank access, N-way associative cache behavior, and many other interesting properties of memory are unlikely to change immediately.

You'd think that memory technology would change enough that the same quirky code optimizations wouldn't work for a whole decade. Apparently, you'd be wrong.

This is, as you might imagine, the exact opposite of the previous book. This is about how the memory systems in a (somewhat) modern PC work. This is about the details of machine architecture and how to use those details to speed up your code.

In fact, translating the information in this book to highly parallel computing will require some thought. It was written without much thought to the behavior of multicore processors.

That's not the point, though. Multithreaded programming is all about memory access. If you poke the memory in the wrong order, your program will slow way, way down. Compilers are not yet smart enough to do all the work for you.

It's worth talking a bit about how important memory access is. Here's a slide shamelessly stolen from Robert Harkness at the San Diego Supercomputer Center:

Performance in on a log scale. Memory bandwidth has a dramatically lower slope than CPU speed.

You could say that, eventually, performance will be entirely dominated by the usage of memory. However, we're almost there already. High performance data-parallel programming requires the knowledge in this book.

I do wish he'd write a second edition, though. Some of the chip-specific discussions are interesting, but they aren't necessarily relevant anymore.

Lower Performance Multithreaded Code

I should emphasize that most of the stuff in both of these books is useful for pushing performance a bit faster than you'd thought possible.

If you haven't already gotten to the point where you think your code can't be sped up, then you're likely to have more serious problems that will erase the improvements available from these books.

Actually, there's one significant exception to that. If you're using a low associativity cache poorly, then you could get almost no cache utilization. In some cases, you can make minor changes to your memory usage and go from no cache utilization to good cache utilization. That's probably a more important change than using a good algorithm.

Generally, however, these books are not what you need to make your web page load faster. None of the code that I've written in the last few years needs this. I'll keep looking for applications, though, because optimizing performance in exotic ways is a fun!

I enjoyed both books a lot even though they didn't seem directly applicable. I hope you will, too.

Thursday, July 10, 2008

Integration: The Cost of Using Someone Else's Library

I don't do much Ruby on Rails development anymore, but Andy lives right next to me at PC-Doctor. He does.

Recently, he's run into an interesting problem. I've seen the problem once before in a completely different context.

Once might be coincidence, but if you see the same problem twice, then it must be a real problem. :)

Ruby on Rails

If you've got a product to develop, it's normally better to use someone else's library whenever possible.

Ruby on Rails makes this easy. They've got a fancy system to download, install, and update small modules that can be put together cleanly and elegantly to create your product.

It's a wonderful system. It works, too.

Andy discovered that it might work a bit too well!

He's got a medium sized web application that uses a bunch of external modules. He wrote it fairly quickly because he was able to pick and choose modules from a variety of sources to solve a lot of his problems.

Unfortunately, he had to upgrade to a newer version of Ruby. That means that he's got to look for problems in each of the modules he installed and find a version that works with the new version of Ruby.

Some module maintainers are faster than others, of course. Not all of the modules are ready for the new version of Ruby.

This is a problem that doesn't scale happily. As the number of modules goes up, the chance of one or more modules not being ready goes up.

As Andy discovered, this means that an application can become painful to update.

I phrased my title as though Andy might not have been doing to the right thing. I'd better be honest here, though. If one of his modules can't possibly be updated, then he's still better off rewriting that module. The alternative is to write all of the modules during application development.

Andy did the right thing. The pain he had while updating was minor compared to the alternative.

Ruby on Rails makes it extremely easy to combine large numbers of modules from different sources. The problem can be duplicated any time you get large numbers of independent developers working together.


The Boost libraries seem to be suffering from the same problem.

Boost doesn't put a lot of emphasis on stability, either. Changes to libraries are encouraged and frequent. Versions of the library aren't even required to be backwards compatible.

The end result is the same as Andy's problem. One library will change a bit, and that change will have to ripple through a bunch of other libraries. It can take a while to squeeze each contributor enough to get them to update their library enough to put out the next version of Boost. (Boost.Threads was the worst case of this. The developer disappeared with his copyright notice still in the source files!)

It's hard to blame either the release manager or the contributors. They're volunteers with paying jobs, after all.

The end result is still unfortunate. It now takes about a year or so to release a new version of the framework. Some libraries end up languishing unreleased for a long, long time because of this.

Boost has gone through a lot of releases. This makes it really tempting to look at this quantitatively. :)

To the right is a chart showing the number of days between major releases. This is, of course, a silly thing to look at. What defines a major release? There was only 5 days between version 1.12.0 and 1.13.0, for example.

The lower numbers on the chart show the number of libraries that changed with each release. There is a slight upward trend to that as well. Clearly, newer releases contain more new stuff in them than the older releases. Furthermore, not all changes to libraries are the same. Some of the more recent changes are substantial.

Despite all of that, I'm going to claim that the release schedule is slowing down over time. There are many reasons for this, but one of them could well be the same problem that Andy has.

Before a release goes out, there is often a plea on the Boost developers' mailing list for assistance on a few libraries. Those calls for help are proof that the size of Boost is slowing it down. If they have more libraries then they'll have more libraries with trouble.

Early versions of Boost had extremely light weight coupling between the different libraries. More recent versions are significantly more coupled. As developers get familiar with other Boost libraries, they are extremely likely to start using them in newer libraries. It's almost inevitable that the coupling increases over time.

The developers for each library continue to mostly be volunteers who don't always have time to make prompt updates. Getting all updates to all libraries to line up at the same time can't be easy.

Commercial Projects

Both of these examples involve open source projects. Andy isn't building an open source application, but he is relying heavily on open source modules. Boost is entirely open source.

An open source project is going to end up relying on volunteers. It's really hard to manage volunteers! Is it any easier on a large commercial project?

I don't have any direct experience with this. I've never been a part of a big company with hundreds of people all working on the same thing.

Is the problem unique to open source projects? I've got no data, but I'll make some speculations.

Some fraction of open source developers are volunteers with other jobs. This isn't true for commercial projects.

A developer who's spending their free time on a project will have to schedule their time around a higher priority project that's paying them. According to this theory, this dramatically increases the spread in the amount of time required to complete their job.


The problem probably isn't unique to open source projects, but I suspect that it's worse for them.

Ruby on Rails encourages using large numbers of independently developed modules. This model will exacerbate the problem.

I'd love to hear from someone who's got experience with large projects. The problem gets worse with big projects. Too bad I don't know much about what happens with them.

Monday, July 7, 2008

Rvalue References Explained

Thomas Becker just sent me a note about an article that he'd just written. Rvalue references aren't in wide use, yet, and they aren't part of the official standard, either. Not many people understand them, yet. I'm sure his article will dramatically increase the number of people who understand them since Thomas is such a good writer.

If you'd like to play with rvalue references after reading his article, GCC 4.3.1 is what you want. You can access them using the -std=c++0x compiler option.

Doug Gregor's C++0x page can be used to track the progress of that compiler option.

Wednesday, July 2, 2008

Testing the Untested

Test driven development is the cool new way to write software. TDD revolves around writing the tests for the software before or during the development of the software. If you do anything like that, then you'll end up with well tested software.

One of the fantastic things about well tested software is that you can change it confidently. Rapid change is the mantra of a large number of the newer software development methods.

I'd like to talk about the other end of the spectrum, though.

What happens if you've got a collection of code that doesn't have a good collection of tests? What happens if the code was written without thinking about testability?

It can become extremely hard to add tests to code like this after the fact.

I carefully phrased the problem so that you don't have to admit to writing the code yourself. I've written code like this, though. I'm guilty.

I've also recovered successfully from poorly tested code. What does it take to make this recovery?

Unit tests

Unit tests drive individual functions within a specific module. A good set of unit tests will help improve your API's reliability.

If you wrote the module with testing in mind, then you'd have a small executable that would load the module and run the tests. This could be done fairly quickly, so you could incorporate the testing into your build process.

If you didn't write the module to be tested, then there's a chance that the module depends heavily on the application's setup and shutdown routines. I hope that's not the case, because, if it is, you'll either have to refactor a lot of the setup code or you'll have to run your unit tests within the application.

The latter option means that, in order to run your unit tests, you've got to startup the entire application and then go into a mode where tests are run. Assuming the application startup time is significant, this will slow down unit testing and make it happen less often. Avoid this if it's possible.

If you can create a light weight process that can load the module to be tested, then you'll only be limited by the amount of time you can spend on tests.

Well... That's not quite true, actually.

In theory, you can write some unit tests and incorporate those tests into your build process, but you originally wrote that module in a corporate culture that didn't require unit tests.

I'll talk about that problem at the end of this post. It's important.

How Many Unit Tests Are Needed?

How many tests should be added right away? This is going to depend on the project's goals. You won't be able to get to the point where you can make changes with confidence until you have a mostly complete set of tests. Getting a complete set of tests will be a lot of work, however. The good thing is that you'll probably fix a bunch of bugs while you create the tests.

If you don't trust your code and you have an infinite amount of time, I'd recommend going all out and creating a complete set of unit tests. This will fix a bunch of problems with the code, and it will force you to understand the code a lot better. In the end, you'll have a lot more confidence in the code.

Of course, time constraints are likely to prevent you from spending that much time writing new unit tests. What's a more realistic amount of tests to add?

If you're adding tests late in development, it's probably because there's a problem that needs to be solved. If the problems come from a small number of modules, then these would be a great place to start.

Have a small number of modules with tests is still extremely useful. It will increase your confidence in those modules, and it will allow you to change those modules more easily.

System Testing

System tests use the entire application or almost the entire application to test a variety of use cases. Generally, these are black box tests. This means that you don't worry about what code you're testing. Instead, you test from the user's point of view. A system test will answer questions about what a user can and can't do.

System tests are often easier to insert into an application that wasn't designed for testing.

There are several ways you can create these.

First, you can use a tool that drives your application through it's UI. There are a variety of tools out there. They can click on buttons and menu items., I'm not familiar with any of them, so I won't make any recommendations.

The tools that I've seen strongly couple your tests to your user interface. This might be good if you want to test the details of your UI, but generally you'd rather have tests that are easy to maintain.

If you have a set of stable keyboard shortcuts, then you could use a tool like AutoIt or the Win32 keybd_event function to drive the program. I'd prefer this over something that looks for controls on the screen and sends mouse clicks to them, but I may be too conservative here.

A further improvement of this technique can be done by using a macro capability that's already built into the application.

This bypasses the user interface. You might consider that a problem, but a macro language is likely to be a lot more stable than the layout of menus and dialog boxes, so the tests themselves are likely to be significantly easier to maintain.

Besides sending in inputs, you'll also have to verify the program's output. Saved files, the clipboard, or scanning log outputs can all be used effectively.

System tests are significantly more complex to setup than unit tests. Furthermore, they take a lot longer to run.

Because they take so long to run, it is unlikely that you'll ever convince your team to run these as part of their build process. Instead, you're likely to have a separate machine or set of machines setup somewhere to run the tests. That's what we do here at PC-Doctor for unit tests.

It's easy to make system tests take so long to run that they become unwieldy. Try to focus your tests on individual use cases. Keep in mind the length of time that you'd like your tests to take, and keep that budget in mind.

Of course, you can always add more machines to your testing farm, but this might not happen until you convince your workplace that they're useful.

A Culture of Testing

A culture that thinks that unit tests are a waste of time isn't going to want to add tests to their build process. They aren't going to want you to spend the time to build the tests, either.

Changing corporate culture is a complex topic that will have to be in a different post. It's hard. I don't recommend it.

Instead, I'd hope that the project that you're adding testing needs it badly. In that case, the benefits of testing should be large enough that you may start to get people to notice.

Don't expect everyone to suddenly run all of your unit tests when they compile your modules. Instead, use your work to start a discussion about it.


I'd also love to give some examples. One project that I worked on many years ago went through the whole process. World of Warcraft desperately needs to go through the process. (It annoys me how few tests they've got.) I'd like to talk about both.

I'd also like to keep this post relatively short, however. I'll wait until part 2 to discuss specific examples.