Mercurial Vendor Branching

From Second Life Wiki
Jump to: navigation, search

We want to have a way to capture local modifications to a third-party open-source project,without needing write access to their public repository. We want to be able to carry forward such modifications to newer versions of the public project. All this should be independent of the organizational decision as to whether it's even desirable to try to submit our local modifications upstream.

Fortunately, the Subversion folks articulated a process years ago that addresses this very requirement. They call it "Vendor Branches." The same tactic, suitably adapted, works with Mercurial too.

The essence of the idea is that we capture and tag a particular snapshot of the open-source project. We develop our local modifications to that, and the repository tip incorporates them. But when we want to update to a newer version of the public project, we bring it into the repository in such a way that we can discover the changes from the original snapshot and the new one -- and then apply those deltas to the combined source.

The following material is adapted from the Red Bean Subversion book, but recast for Mercurial.

General Vendor Branch Management Procedure

Initial Setup

Managing vendor branches generally works like this. You create a named branch (such as "vendor") to store the vendor source snapshots. Then you import the third party code into that branch. Your default branch (for example, "default") is based on "vendor". You always make your local changes to the default branch. With each new release of the code you are tracking you bring it into the "vendor" branch and merge the changes into "default", resolving whatever conflicts occur between your local changes and the upstream changes.

Perhaps an example will help to clarify this algorithm. We'll use a scenario where your development team is creating a calculator program that links against a third-party complex number arithmetic library, libcomplex. We'll construct a repository specifically for our locally-modified version of that library. To begin, we must initialize our repository and create at least one file in our "default" branch.

 hg init ourcomplex
 cd ourcomplex
 touch README.txt
 hg add README.txt
 hg commit README.txt

Now we can create the vendor branch and do the import of the first vendor drop. We'll call our vendor branch "vendor", and each successive code drop will be tagged "current".

 hg branch vendor
 tar -xjf ../libcomplex-1.0.tar.bz2
 mv libcomplex-1.0 libcomplex
 hg addremove
 hg commit -m "1.0 source drop"
 hg tag -r tip current
KBcaution.png Important: Some upstream projects package their releases with a version-stamped top-level directory in the tarball. It is essential that we rename the top-level directory without a version number every time we unpack a new tarball.

Why? This whole mechanism is based around Mercurial's ability to identify the edits made by the upstream vendor from one version to the next, and merge them with our local edits.

Consider the alternatives.

  • Let's say we commit libcomplex-1.0, and then in the course of time we make local edits to it. Now the vendor releases libcomplex-1.1, which we desire. We unpack the tarball, commit a whole new libcomplex-1.1 source tree and delete libcomplex-1.0, along with any local edits we've made. No good.
  • Okay, so we've committed libcomplex-1.0, but when we get the new libcomplex-1.1 tarball, we replace it into libcomplex-1.0. That works -- but now the directory name is a lie.
  • The best way forward is to eliminate any version number from the top-level directory name so that we can replace libcomplex-1.0 (in the libcomplex directory) with libcomplex-1.1 (into the libcomplex directory) without fuss.

We now have the current version of the libcomplex source code in branch "vendor", tagged "current". Now, we merge it into the default branch. It is in the default branch that we will make our customizations.

 hg update default
 hg merge vendor
 hg commit -m "initial: 1.0"

Upgrading

We get to work customizing the libcomplex code. Before we know it, our modified version of libcomplex is now completely integrated into our calculator program.

A few weeks later, the developers of libcomplex release a new version of their library—version 1.1—which contains some features and functionality that we really want. We'd like to upgrade to this new version, but without losing the customizations we made to the existing version. What we essentially would like to do is to replace our current baseline version of libcomplex 1.0 with a copy of libcomplex 1.1, and then re-apply the custom modifications we previously made to that library to the new version. But we actually approach the problem from the other direction, applying the changes made to libcomplex between versions 1.0 and 1.1 to our modified copy of it.

To perform this upgrade, we update our repository to our vendor branch, and update the "current" tag with the new libcomplex 1.1 source code. We quite literally replace the existing files with the new files, clearing out the whole tree and exploding the libcomplex 1.1 release tarball in its place. The goal here is to make the tip of our vendor branch contain only the libcomplex 1.1 code, and to ensure that all that code is under version control. Oh, and we want to do this with as little version control history disturbance as possible.

 hg update vendor
 rm -rf libcomplex
 tar -xjf ../libcomplex-1.1.tar.bz2
 mv libcomplex-1.1 libcomplex
 hg addremove -s 60
 hg commit -m "1.1 source drop"

Do not forget to rename the top-level libcomplex-1.1 directory to the same version-generic libcomplex name with which we committed the original tarball.

After unpacking the 1.1 tarball, hg status will show files with local modifications as well as, perhaps, some unversioned or missing files. If we did what we were supposed to do, the unversioned files are only those new files introduced in the 1.1 release of libcomplex. The missing files are files that were in 1.0 but not in 1.1. hg addremove deals with both, and more: the -s 60 switch directs Mercurial to compare added files to deleted files, recognizing any file at least 60% similar as a move/rename.

Note that hg addremove is better than Subversion's svn_load_dirs.pl, in that it can automatically detect renamed files: svn_load_dirs.pl prompts the user to identify renamed files, which doesn't scale at all well. However, be prepared to be patient: hg addremove uses an O(N2) algorithm to detect renamed files, and some of the libraries we manage this way are large enough for the command to run for many minutes.

Finally, once our current working copy contains only the libcomplex 1.1 code, we commit the changes we made to get it looking that way.

Our current branch now contains the new vendor drop. We tag the new version (in the same way we previously tagged the version 1.0 vendor drop), and then merge the differences between the previous version and the new current version into our default branch.

 hg tag -r tip current
 hg update default
 hg merge vendor
 # resolve all the conflicts between their changes and our changes
 hg commit -m "update with 1.1"