Benefits from a real world switch from CVS to darcs

by Mark Stosberg

How darcs has made my life easier

basic work flow design and our new smoke bot

I took a chance and designed a new work flow around darcs' decentralized nature, taking advantage of its metaphor that every repository is a working directory.

First, I eliminated the abstract "central repository". Now our "alphasite" is the place the developers push their changes.

This on its own has multiple advantages. First, the alphasite is constantly updated, and doesn't need to be updated specially for an alphasite review. When these reviews do happen, which are relatively brief and infrequent compared to development time, we can simply do not push then, keeping the alphasite stable.

This design also helped with a problem that's perhaps somewhat unique to website projects. To run our automated test suite, we need a mod_perl server dedicated to this code line, and many Perl modules installed in the system libraries.

Since we already had to have a mod_perl setup for the alphasite, there was little to do to create a "smoke bot", a script that frequently tests the code via cron and e-mails the developers if there any problems. In fact, our current solution is a single line cron script entry which restarts the Apache server and runs the test suite.

With CVS this would be more complex, as a new check-out of the code would need to be made. In our case, the extra effort was enough that the smoke bot system never got setup to run under CVS.

A better launch process

I made one other tweak to the repository flow. Although the betasite pulls its changes from the alphasite, the production copy does not. Instead, it pulls from the betasite by default.

Although we could subvert this flow, by design our agreed quality control flow is now built-in into the system. With CVS, it was easy for me to make a change to my personal copy, and then switch to the production copy to do a cvs update to get that change.

Now there is at least one extra step: I have to pull the change to the betasite before pulling it to the production site. This helps to prevent subversion of the quality control mechanisms simply because it's easy to do so.

Better personal change management

Admittedly, we are all still adjusting to the fact that a darcs record does not share our changes like cvs commit does.

We are also noticing the benefits of it. Each us may be personally working on a task of some complexity, such as "optimize site", which may involve several individual optimization tasks. With darcs, a developer can record several individual optimization changes, but only push them once the whole task is complete.

This is another way in which we avoid changes that are not ready for launch or review from getting in the way of something that is. This illustrates that each copy is indeed like it's own branch, without the overhead of learning extra commands to deal with branches.

Easy cherry picking

We use RT as our issue tracking system. So we may track a particular programming task as RT#123.

We use darcs' ability to take action based on the patch name to create another branch-like feature based our issue numbers.

The "fast track" change request is a great example of the benefits of this. Let's say my personal repo, the alphasite and betasite all have various changes that are not ready to launch, and each is in a different state.

A request comes in as RT#654 that should be launched ASAP, ahead of other work. I complete the work with three records, including "RT#654" as prefix to each record message. From there, it's easy to let these updates flow by the others towards production. I do:

darcs push -p 'RT#654'

And just those patches will flow to the alphasite. Then on the betasite and then production, I simply do:

darcs pull -p 'RT#654'

The change launches with a minimum of fuss, and we all leave early to play mini-golf. (That's actually happened at my office...)

As the launch-master, this feature alone is worth the switch.

Easier developer collaboration

On occasion, I'll want to help another developer on a task before his work is ready to send our central repository. With darcs, I could pull a changeset from one of my peer's personal repos. With CVS, we may have resorted to committing something broken, just so another developer could have a copy to work on, or resort to manually shuffling files around in our directories. Darcs is a cleaner solution.

Better exception handling

Although we try to eliminate special cases for each place we checkout code, we have at least one exception on the production site: There is an e-mail handling script that we haven't yet figured out how to make work with relative paths, so the complete paths of the production environment are hardcoded in it.

In CVS, the files involved were left un-committed, thus CVS reported them with a "modified" status, which is usually something that shouldn't happen on the production site. This was messy, because it relied on remembering about uncommitted changes.

Darcs has a cleaner solution. Until the production-specific cases are eliminated, I can record them as darcs changesets which only exist on the production site.

Then instead of wondering why there are some modified files in production, I see (using darcs pull --dry-run --verbose) that there are a few changes which only exist in production and darcs shows me the human-friendly patch names to remind me why the heck that is.

Better code review

By default, darcs works with changes at a more detailed level than CVS does. It not only tells that there are changes in particular files, it shows you each change and interactively asks you to confirm each patch "hunk" you are about to record.

If "show not tell" is a recipe for good storytelling, it's also a good recipe for a source control system interface.

Seeing the changes in detail encourages better habits. For example, I may see a forgotten about change and record it as second change or not at all. I may see some extra debugging statements which I left in my personal copy, but don't want to include in the changeset.

Soon I will have recorded all my changes, except a few debugging statements which I now want to remove. Instead of searching through the code for spurious output to STDERR, I can just run darcs revert. Like record, I will be prompted to review each pending change, but this time to remove them. Cleaning up my code just got a little faster.

Better infrastructure: Built-in XML support

Besides the improvements to my user-level experience, I also appreciate some of the deeper design decisions in darcs. Here's an anecdote which illustrates the benefit of darcs built-in support for XML.

Pedro Melo wrote a Darcs-Changes-to-RSS Perl script in 82 lines and less than one hour. Alexander Staubo noted that with the right XML stylesheet, this function can be done in a single line:

darcs changes --xml | xsltproc rss.xsl -

That's notable because to accomplish the same function for CVS, the cvs2rss Perl script takes over 3 times as much code, and in turn relies on the non-standard cvs2cl.pl script, adding almost another 1,200 lines of complexity to the system.

Darcs was designed to play well with others, and it shows in its simplicity of integration.


Introduction  | Background  | What's Easier  | Performance Issues  | Final Words