Tuesday, November 27, 2007

What's that Data Structures Class for?

I assume that when computer science students go through college, they all take a required course in data structures. If I were designing a course like this, I'd make them learn how a variety of useful data structures worked. Certainly if you read a book on data structures you'll learn this sort of thing.

How many programmers actually need this information? In today's world, there are a lot of libraries out there that have a reasonable implementation of AVL tree, hash table, and B tree. Certainly some people need to learn to write these things. Why does your typical student programmer care about different ways to handle collisions when inserting into a hash table?

Okay, I'll admit that programming without this knowledge for me would feel a bit like skiing naked. It'd be a bit too weird to choose an algorithm because someone told me that I should always use AVL trees because their worst case performance and therefore their predictability in the face of ignorance is better. For me, I'd rather be able to use a sorted array to improve locality of reference even if I don't know that there's a problem anywhere. I'm sure that at least 95% of the time that I've used an immutable sorted array it hasn't made a difference. I certainly don't check with a profiler unless a real problem exists.

Every so often performance does matter, though. It sometimes matters a lot. In that case, you might say that you need a programmer who knows to check the hash tables to make sure they aren't behaving horribly. However, a decent profiler is likely to tell you a lot of useful information without having any knowledge of data structures. Since these cases are rare for typical programmers, wouldn't they be just fine if they knew a collection of data structures that they could swap in until they got something that worked better?

I can't remember many times when I've had to swap out a data structure because of performance issues. The last time was inserting into the end of a very large STL vector. The copies were too expensive due to C++'s tendency to copy things too often. (Even this case will be fixed with the next C++ standard.) Anyway, STL has some other data structures that can be used. I was able to replace my data structure and things immediately improved. I can't remember enough details to know what knowledge I needed. It's also possible that I guess correctly more often than an ignorant programmer would. Who knows how many times they might need to swap things randomly?

A C# programmer would have it even easier. The System.Collection namespace in .NET doesn't have a lot of different options. It'd be pretty easy to try all the possibilities pretty quickly. If none of the options solves the problem, it's entirely possible that there's something that could be done elsewhere.

Memory and speed performance are pretty much the only times you might care about the differences between a hash table and an AVL tree. A few years from now, application programmers may just add a bit of concurrency if they want more speed. Few web applications run low on memory. Are data structures classes useful anymore for typical programmers?

I've left a lot of unanswered questions in this post. I'm really curious about the answers. I'd love to hear from you.

This originally appeared in PC-Doctor's blog.

No comments: