Wednesday, February 11, 2009

Building Terrier by Open Collaboration

An important benefit of having an open source IR platform, like Terrier, is that users of the platform can contribute code to the platform, and overall, everyone gains. IR platforms which are not open source may be popular, but can stagnate if it does not evolve to meet modern needs. Open source is a good way of building such a critical mass of people to evolve a project.

To facilitate the task of our users who contribute to Terrier, we are in the process of making changes that will also make the development process easier:
An issue tracker allows issues (bugs or feature requests) to be named, discussed, and patches proposed. Other contributors may review and discuss these patches before they are committed. All development work on the Terrier open source platform will now be done via the issue tracker. In deciding to deploy JIRA, we did take some time to review several issue trackers. I'll describe these and how we came to our decision in a future post.
The goal of opening our source code repository is that patches submitted by contributors can be made against the latest (trunk) Terrier source, thus ensuring that no stale patches are received. As a committer this will make my job easier.

I recently announced these changes in Rome at the New challenges in Information Retrieval and Text Mining in an open source initiative workshop. You can see my slides from the workshop below:




2 comments:

Jon said...

Are you considering Sourceforge? I have mixed feeling about it -- it seems to be incredibly slow recently, and the interface is awful.

If you haven't committed to a hosting site, I'd recommend checking out GitHub (https://github.com/). 'git' is a pleasure to use (I prefer it to SVN now) and much more flexible than a centralized VCS.

Craig Macdonald said...

Not considering Sourceforge at all.

We have the JIRA hosted locally (which has a considerably nicer interface than SF's proprietary one), and the forum also. The plan is that the source repository will also be hosted locally.

In doing all of the above, we cover most of the advantages of being at Sourceforge, without the ads and the slowness.

I haven't made a choice which VCS tool for the open repository. Thanks for the tip on github though.