Thursday, July 10, 2008

Integration: The Cost of Using Someone Else's Library

I don't do much Ruby on Rails development anymore, but Andy lives right next to me at PC-Doctor. He does.

Recently, he's run into an interesting problem. I've seen the problem once before in a completely different context.

Once might be coincidence, but if you see the same problem twice, then it must be a real problem. :)

Ruby on Rails


If you've got a product to develop, it's normally better to use someone else's library whenever possible.

Ruby on Rails makes this easy. They've got a fancy system to download, install, and update small modules that can be put together cleanly and elegantly to create your product.

It's a wonderful system. It works, too.


Andy discovered that it might work a bit too well!

He's got a medium sized web application that uses a bunch of external modules. He wrote it fairly quickly because he was able to pick and choose modules from a variety of sources to solve a lot of his problems.

Unfortunately, he had to upgrade to a newer version of Ruby. That means that he's got to look for problems in each of the modules he installed and find a version that works with the new version of Ruby.

Some module maintainers are faster than others, of course. Not all of the modules are ready for the new version of Ruby.

This is a problem that doesn't scale happily. As the number of modules goes up, the chance of one or more modules not being ready goes up.

As Andy discovered, this means that an application can become painful to update.

I phrased my title as though Andy might not have been doing to the right thing. I'd better be honest here, though. If one of his modules can't possibly be updated, then he's still better off rewriting that module. The alternative is to write all of the modules during application development.

Andy did the right thing. The pain he had while updating was minor compared to the alternative.

Ruby on Rails makes it extremely easy to combine large numbers of modules from different sources. The problem can be duplicated any time you get large numbers of independent developers working together.

Boost


The Boost libraries seem to be suffering from the same problem.

Boost doesn't put a lot of emphasis on stability, either. Changes to libraries are encouraged and frequent. Versions of the library aren't even required to be backwards compatible.

The end result is the same as Andy's problem. One library will change a bit, and that change will have to ripple through a bunch of other libraries. It can take a while to squeeze each contributor enough to get them to update their library enough to put out the next version of Boost. (Boost.Threads was the worst case of this. The developer disappeared with his copyright notice still in the source files!)

It's hard to blame either the release manager or the contributors. They're volunteers with paying jobs, after all.

The end result is still unfortunate. It now takes about a year or so to release a new version of the framework. Some libraries end up languishing unreleased for a long, long time because of this.

Boost has gone through a lot of releases. This makes it really tempting to look at this quantitatively. :)

To the right is a chart showing the number of days between major releases. This is, of course, a silly thing to look at. What defines a major release? There was only 5 days between version 1.12.0 and 1.13.0, for example.

The lower numbers on the chart show the number of libraries that changed with each release. There is a slight upward trend to that as well. Clearly, newer releases contain more new stuff in them than the older releases. Furthermore, not all changes to libraries are the same. Some of the more recent changes are substantial.

Despite all of that, I'm going to claim that the release schedule is slowing down over time. There are many reasons for this, but one of them could well be the same problem that Andy has.

Before a release goes out, there is often a plea on the Boost developers' mailing list for assistance on a few libraries. Those calls for help are proof that the size of Boost is slowing it down. If they have more libraries then they'll have more libraries with trouble.

Early versions of Boost had extremely light weight coupling between the different libraries. More recent versions are significantly more coupled. As developers get familiar with other Boost libraries, they are extremely likely to start using them in newer libraries. It's almost inevitable that the coupling increases over time.

The developers for each library continue to mostly be volunteers who don't always have time to make prompt updates. Getting all updates to all libraries to line up at the same time can't be easy.

Commercial Projects


Both of these examples involve open source projects. Andy isn't building an open source application, but he is relying heavily on open source modules. Boost is entirely open source.


An open source project is going to end up relying on volunteers. It's really hard to manage volunteers! Is it any easier on a large commercial project?

I don't have any direct experience with this. I've never been a part of a big company with hundreds of people all working on the same thing.

Is the problem unique to open source projects? I've got no data, but I'll make some speculations.

Some fraction of open source developers are volunteers with other jobs. This isn't true for commercial projects.

A developer who's spending their free time on a project will have to schedule their time around a higher priority project that's paying them. According to this theory, this dramatically increases the spread in the amount of time required to complete their job.

Conclusions


The problem probably isn't unique to open source projects, but I suspect that it's worse for them.

Ruby on Rails encourages using large numbers of independently developed modules. This model will exacerbate the problem.

I'd love to hear from someone who's got experience with large projects. The problem gets worse with big projects. Too bad I don't know much about what happens with them.

1 comment:

Anonymous said...

one thing to point out,
there are a lot of RoR shops who are paying developers to create stuff - but also support them creating OSS Rails plugins & ruby gems while they are working. So though it may be OSS, there are people getting paid to develop and maintain the library.

An example of this in Rails-land is Engine Yard. The spend gobs of cash for developers to create and maintain cutting edge Rails/Ruby/Web server technologies. Though they are freely giving it away so that their competitors can also get the code, the advantage they will always have is that the people who know those libraries best are on their staff. It's going to be real easy for them to get a library forked and modified to suit the needs of a particular project. Or even get a feature they want developed.

my point is, there is a financial gain to paying people to maintain OSS code. Maybe not all OSS code, but some of it pays. EY is raking in cash these days, because everyone in the biz knows that if you want the best Rails deployment they are among the top operations around.

tying back into the theme, the problem here is that if the particular library is not all that popular it gets less love. And if you're using that library, well you're on a rocky road.