Advanced Merging

Advanced Merging
Prev	Chapter 4. Branching and Merging	Next

Cherrypicking

Just as the term “changeset” is often used in version control systems, so is the term cherrypicking. This word refers to the act of choosing one specific changeset from a branch and replicating it to another. Cherrypicking may also refer to the act of duplicating a particular set of (not necessarily contiguous!) changesets from one branch to another. This is in contrast to more typical merging scenarios, where the “next” contiguous range of revisions is duplicated automatically.

Why would people want to replicate just a single change? It comes up more often than you'd think. For example, let's go back in time and imagine that you haven't yet merged your private feature branch back to the trunk. At the water cooler, you get word that Sally made an interesting change to integer.c on the trunk. Looking over the history of commits to the trunk, you see that in revision 355 she fixed a critical bug that directly impacts the feature you're working on. You might not be ready to merge all the trunk changes to your branch just yet, but you certainly need that particular bug fix in order to continue your work.

$ svn diff -c 355 ^/calc/trunk

Index: integer.c
===================================================================
--- integer.c	(revision 354)
+++ integer.c	(revision 355)
@@ -147,7 +147,7 @@
     case 6:  sprintf(info->operating_system, "HPFS (OS/2 or NT)"); break;
     case 7:  sprintf(info->operating_system, "Macintosh"); break;
     case 8:  sprintf(info->operating_system, "Z-System"); break;
-    case 9:  sprintf(info->operating_system, "CP/MM");
+    case 9:  sprintf(info->operating_system, "CP/M"); break;
     case 10:  sprintf(info->operating_system, "TOPS-20"); break;
     case 11:  sprintf(info->operating_system, "NTFS (Windows NT)"); break;
     case 12:  sprintf(info->operating_system, "QDOS"); break;

Just as you used svn diff in the prior example to examine revision 355, you can pass the same option to svn merge:

$ svn merge -c 355 ^/calc/trunk
--- Merging r355 into '.':
U    integer.c
--- Recording mergeinfo for merge of r355 into '.':
 U   .

$ svn status
M       integer.c

You can now go through the usual testing procedures before committing this change to your branch. After the commit, Subversion marks r355 as having been merged to the branch so that future “magic” merges that synchronize your branch with the trunk know to skip over r355. (Merging the same change to the same branch almost always results in a conflict!)

$ cd my-calc-branch

$ svn propget svn:mergeinfo .
/trunk:341-349,355

# Notice that r355 isn't listed as "eligible" to merge, because
# it's already been merged.
$ svn mergeinfo ^/calc/trunk --show-revs eligible
r350
r351
r352
r353
r354
r356
r357
r358
r359
r360

$ svn merge ^/calc/trunk
--- Merging r350 through r354 into '.':
 U   .
U    integer.c
U    Makefile
--- Merging r356 through r360 into '.':
 U   .
U    integer.c
U    button.c
--- Recording mergeinfo for merge of r350 through r360 into '.':
 U   .

This use case of replicating (or backporting) bug fixes from one branch to another is perhaps the most popular reason for cherrypicking changes; it comes up all the time, for example, when a team is maintaining a “release branch” of software. (We discuss this pattern in the section called “Release Branches”.)

	Warning
Did you notice how, in the last example, the merge invocation merged two distinct ranges? The svn merge command applied two independent patches to your working copy to skip over changeset 355, which your branch already contained. There's nothing inherently wrong with this, except that it has the potential to make conflict resolution trickier. If the first range of changes creates conflicts, you must resolve them interactively for the merge process to continue and apply the second range of changes. If you postpone a conflict from the first wave of changes, the whole merge command will bail out with an error message.^[32]

Warning

Did you notice how, in the last example, the merge invocation merged two distinct ranges? The svn merge command applied two independent patches to your working copy to skip over changeset 355, which your branch already contained. There's nothing inherently wrong with this, except that it has the potential to make conflict resolution trickier. If the first range of changes creates conflicts, you must resolve them interactively for the merge process to continue and apply the second range of changes. If you postpone a conflict from the first wave of changes, the whole merge command will bail out with an error message.^[32]

A word of warning: while svn diff and svn merge are very similar in concept, they do have different syntax in many cases. Be sure to read about them in Chapter 9, Subversion Complete Reference for details, or ask svn help. For example, svn merge requires a working copy path as a target, that is, a place where it should apply the generated patch. If the target isn't specified, it assumes you are trying to perform one of the following common operations:

You want to merge directory changes into your current working directory.
You want to merge the changes in a specific file into a file by the same name that exists in your current working directory.

If you are merging a directory and haven't specified a target path, svn merge assumes the first case and tries to apply the changes into your current directory. If you are merging a file, and that file (or a file by the same name) exists in your current working directory, svn merge assumes the second case and tries to apply the changes to a local file with the same name.

Merge Syntax: Full Disclosure

You've now seen some examples of the svn merge command, and you're about to see several more. If you're feeling confused about exactly how merging works, you're not alone. Many users (especially those new to version control) are initially perplexed about the proper syntax of the command and about how and when the feature should be used. But fear not, this command is actually much simpler than you think! There's a very easy technique for understanding exactly how svn merge behaves.

The main source of confusion is the name of the command. The term “merge” somehow denotes that branches are combined together, or that some sort of mysterious blending of data is going on. That's not the case. A better name for the command might have been svn diff-and-apply, because that's all that happens: two repository trees are compared, and the differences are applied to a working copy.

If you're using svn merge to do basic copying of changes between branches, it will generally do the right thing automatically. For example, a command such as the following:

$ svn merge ^/calc/branches/some-branch

will attempt to duplicate any changes made on some-branch into your current working directory, which is presumably a working copy that shares some historical connection to the branch. The command is smart enough to only duplicate changes that your working copy doesn't yet have. If you repeat this command once a week, it will only duplicate the “newest” branch changes that happened since you last merged.

If you choose to use the svn merge command in all its full glory by giving it specific revision ranges to duplicate, the command takes three main arguments:

An initial repository tree (often called the left side of the comparison)
A final repository tree (often called the right side of the comparison)
A working copy to accept the differences as local changes (often called the target of the merge)

Once these three arguments are specified, then the two trees are compared and the differences applied to the target working copy as local modifications. When the command is done, the results are no different than if you had hand-edited the files or run various svn add or svn delete commands yourself. If you like the results, you can commit them. If you don't like the results, you can simply svn revert all of the changes.

The syntax of svn merge allows you to specify the three necessary arguments rather flexibly. Here are some examples:

$ svn merge http://svn.example.com/repos/branch1@150 \
            http://svn.example.com/repos/branch2@212 \
            my-working-copy

$ svn merge -r 100:200 http://svn.example.com/repos/trunk my-working-copy

$ svn merge -r 100:200 http://svn.example.com/repos/trunk

The first syntax lays out all three arguments explicitly, naming each tree in the form URL@REV and naming the working copy target. The second syntax is used as a shorthand for situations when you're comparing two different revisions of the same URL. The last syntax shows how the working copy argument is optional; if omitted, it defaults to the current directory.

While the first example shows the “full” syntax of svn merge, use it very carefully; it can result in merges which do not record any svn:mergeinfo metadata at all. The next section talks a bit more about this.

Merges Without Mergeinfo

Subversion tries to generate merge metadata whenever it can, to make future invocations of svn merge smarter. There are still situations, however, where svn:mergeinfo data is not created or changed. Remember to be a bit wary of these scenarios:

Merging unrelated sources: If you ask svn merge to compare two URLs that aren't related to each other, a patch is still generated and applied to your working copy, but no merging metadata is created. There's no common history between the two sources, and future “smart” merges depend on that common history.
Merging from foreign repositories: While it's possible to run a command such as svn merge -r 100:200 http://svn.foreignproject.com/repos/trunk, the resultant patch also lacks any historical merge metadata. At the time of this writing, Subversion has no way of representing different repository URLs within the svn:mergeinfo property.
Using --ignore-ancestry: If this option is passed to svn merge, it causes the merging logic to mindlessly generate differences the same way that svn diff does, ignoring any historical relationships. We discuss this later in this chapter in the section called “Noticing or Ignoring Ancestry”.
Applying reverse merges from a target's natural history: Earlier in this chapter (the section called “Undoing Changes”) we discussed how to use svn merge to apply a “reverse patch” as a way of rolling back changes. If this technique is used to undo a change to an object's personal history (e.g., commit r5 to the trunk, then immediately roll back r5 using svn merge . -c -5), this sort of merge doesn't affect the recorded mergeinfo.^[33]

Natural History and Implicit Mergeinfo

As we mentioned earlier when discussing Mergeinfo Inheritance, a path that has the svn:mergeinfo property set on it is said to have “explicit” mergeinfo. Yes, this implies a path can have “implicit” mergeinfo, too! Implicit mergeinfo, or natural history, is simply a path's own history (see the section called “Examining History”) interpreted as mergeinfo. While implicit mergeinfo is largely an implementation detail, it can be a useful abstraction for understanding merge tracking behavior.

Let's say you created ^/trunk in revision 100 and then later, in revision 201, created ^/branches/feature-branch as a copy of ^/trunk@200. The natural history of ^/branches/feature-branch contains all the repository paths and revision ranges through which the history of the new branch has ever passed:

/trunk:100-200
/branches/feature-branch:201

With each new revision added to the repository, the natural history—and thus, implicit mergeinfo—of the branch continues to expand to include those revisions until the day the branch is deleted. Here's what the implicit mergeinfo of our branch would look like when the HEAD revision of the repository had grown to 234:

/trunk:100-200
/branches/feature-branch:201-234

Implicit mergeinfo does not actually show up in the svn:mergeinfo property, but Subversion acts as if it does. This is why if you check out ^/branches/feature-branch and then run svn merge ^/trunk -c 58 in the resulting working copy, nothing happens. Subversion knows that the changes committed to ^/trunk in revision 58 are already present in the target's natural history, so there's no need to try to merge them again. After all, avoiding repeated merges of changes is the primary goal of Subversion's merge tracking feature!

More on Merge Conflicts

Just like the svn update command, svn merge applies changes to your working copy. And therefore it's also capable of creating conflicts. The conflicts produced by svn merge, however, are sometimes different, and this section explains those differences.

To begin with, assume that your working copy has no local edits. When you svn update to a particular revision, the changes sent by the server always apply “cleanly” to your working copy. The server produces the delta by comparing two trees: a virtual snapshot of your working copy, and the revision tree you're interested in. Because the left hand side of the comparison is exactly equal to what you already have, the delta is guaranteed to correctly convert your working copy into the right hand tree.

But svn merge has no such guarantees and can be much more chaotic: the advanced user can ask the server to compare any two trees at all, even ones that are unrelated to the working copy! This means there's large potential for human error. Users will sometimes compare the wrong two trees, creating a delta that doesn't apply cleanly. The svn merge subcommand does its best to apply as much of the delta as possible, but some parts may be impossible. A common sign that you merged the wrong delta is unexpected tree conflicts:

$ svn merge -r 1288:1351 http://svn.example.com/myrepos/branch
--- Merging r1289 through r1351 into '.':
   C bar.c
   C foo.c
   C docs
--- Recording mergeinfo for merge of r1289 through r1351 into '.':
 U   .
Summary of conflicts:
  Tree conflicts: 3

$ svn st
!     C bar.c
      >   local missing, incoming edit upon merge
!     C foo.c
      >   local missing, incoming edit upon merge
!     C docs
      >   local delete, incoming edit upon merge

In the previous example, it might be the case that bar.c, foo.c, and docs all exist in both snapshots of the branch being compared. The resultant delta wants to change the contents of the corresponding paths in your working copy, but those paths don't exist in the working copy. Whatever the case, the preponderance of tree conflicts most likely means that the user compared the wrong two trees; it's a classic sign of user error. When this happens, it's easy to recursively revert all the changes created by the merge (svn revert . --recursive), delete any unversioned files or directories left behind after the revert, and rerun svn merge with the correct arguments.

Also keep in mind that a merge into a working copy with no local edits can still produce text conflicts.

$ svn merge -c 1701 http://svn.example.com/myrepos/branchX --accept postpone
--- Merging r1701 into '.':
C     glub.c
C     sputter.c
--- Recording mergeinfo for merge of r1701 into '.':
 U   .
Summary of conflicts:
  Text conflicts: 2

C:\SVN\src-branch-1.7.x>svn st
 M      .
?       glub.c.merge-left.r1700
?       glub.c.merge-right.r1701
C       glub.c
?       glub.c.working
?       sputter.c.merge-left.r1700
?       sputter.c.merge-right.r1701
C       sputter.c
?       sputter.c.working
Summary of conflicts:
  Text conflicts: 2

How can a conflict possibly happen? Again, because the user can request svn merge to define and apply any old delta to the working copy, that delta may contain textual changes that don't cleanly apply to a working file, even if the file has no local modifications.

Another small difference between svn update and svn merge is the names of the full-text files created when a conflict happens. In the section called “Resolve Any Conflicts”, we saw that an update produces files named filename.mine, filename.rOLDREV, and filename.rNEWREV. When svn merge produces a conflict, though, it creates three files named filename.working, filename.merge-left.rOLDREV, and filename.merge-right.rNEWREV. In this case, the terms “merge-left” and “merge-right” are describing which side of the double-tree comparison the file came from, “rOLDREV” describes the revision of the left side, and “rNEWREV” the revision of the right side. In any case, these differing names help you distinguish between conflicts that happened as a result of an update and ones that happened as a result of a merge.

Blocking Changes

Sometimes there's a particular changeset that you don't want automatically merged. For example, perhaps your team's policy is to do new development work on /trunk, but is more conservative about backporting changes to a stable branch you use for releasing to the public. On one extreme, you can manually cherrypick single changesets from the trunk to the branch—just the changes that are stable enough to pass muster. Maybe things aren't quite that strict, though; perhaps most of the time you just let svn merge automatically merge most changes from trunk to branch. In this case, you want a way to mask a few specific changes out, that is, prevent them from ever being automatically merged.

Through Subversion 1.7, the only way to block a changeset is to make the system believe that the change has already been merged. To do this, invoke the merge subcommand with the --record-only option:

$ cd my-calc-branch

$ svn propget svn:mergeinfo .
/trunk:1680-3305

# Let's make the metadata list r3328 as already merged.
$ svn merge -c 3328 --record-only ^/calc/trunk
--- Recording mergeinfo for merge of r3328 into '.':
 U   .

$ svn status
M       .

$ svn propget svn:mergeinfo .
/trunk:1680-3305,3328

$ svn commit -m "Block r3328 from being merged to the branch."
…

Beginning with Subversion 1.7, --record-only merges are transitive. This means that, in addition to recording mergeinfo describing the blocked revision(s), any svn:mergeinfo property differences in the merge source are also applied. For example, let's say we want to block the 'frazzle' feature from ever being merged from ^/trunk to our ^/branches/proj-X branch. We know that all the frazzle work was done on its own branch, which was reintegrated to trunk in revision 1055:

$ svn log -v ^/trunk -r 1055
------------------------------------------------------------------------
r1055 | francesca | 2011-09-22 07:40:06 -0400 (Thu, 22 Sep 2011) | 3 lines
Changed paths:
   M /trunk
   M /trunk/src/frazzle.c

Reintegrate the frazzle-feature-branch to trunk.

Because revision 1055 was a reintegrate merge we know that mergeinfo was recorded describing the merge:

$ svn diff ^/trunk -c 1055 --depth empty
Index: .
===================================================================
--- .   (revision 1054)
+++ .   (revision 1055)

Property changes on: .
___________________________________________________________________
Modified: svn:mergeinfo
   Merged /branches/frazzle-feature-branch:r997-1003

Now simply blocking merges of revision 1055 from ^/trunk isn't foolproof since someone could merge r996:1003 directly from ^/branches/frazzle-feature-branch. Fortunately the transitive nature of --record-only merges in Subversion 1.7 prevents this; the --record-only merge applies the svn:mergeinfo diff from revision 1055, thus blocking merges directly from the frazzle branch and as it has always done prior to Subversion 1.7, it blocks merges of revision 1055 directly from ^/trunk:

$ cd branches/proj-X

$ svn merge ^/trunk . -c 1055 --record-only
--- Merging r1055 into '.':
 G   .
--- Recording mergeinfo for merge of r1055 into '.':
 G   .

$ svn diff --depth empty .
Index: .
===================================================================
--- .   (revision 1070)
+++ .   (working copy)

Property changes on: .
___________________________________________________________________
Modified: svn:mergeinfo
   Merged /trunk:r1055
   Merged /branches/frazzle-feature-branch:r997-1003

Blocking changes with --record-only works, but it's also a little bit dangerous. The main problem is that we're not clearly differentiating between the ideas of “I already have this change” and “I don't have this change, but don't currently want it.” We're effectively lying to the system, making it think that the change was previously merged. This puts the responsibility on you—the user—to remember that the change wasn't actually merged, it just wasn't wanted. There's no way to ask Subversion for a list of “blocked changelists.” If you want to track them (so that you can unblock them someday) you'll need to record them in a text file somewhere, or perhaps in an invented property.

Keeping a Reintegrated Branch Alive

There is an alternative to destroying and re-creating a branch after reintegration. To understand why it works you need to understand why the branch is initially unfit for further use after it has been reintegrated.

Let's assume you created your branch in revision A. While working on your branch, you created one or more revisions which made changes to the branch. Before reintegrating your branch back to trunk, you made a final merge from trunk to your branch, and committed the result of this merge as revision B.

When reintegrating your branch into the trunk, you create a new revision X which changes the trunk. The changes made to trunk in this revision X are semantically equivalent to the changes you made to your branch between revisions A and B.

If you now try to merge outstanding changes from trunk to your branch, Subversion will consider changes made in revision X as eligible for merging into the branch. However, since your branch already contains all the changes made in revision X, merging these changes can result in spurious conflicts! These conflicts are often tree conflicts, especially if renames were made on the branch or the trunk while the branch was in development.

So what can be done about this? We need to make sure that Subversion does not try to merge revision X into the branch. This is done using the --record-only merge option, which was introduced in the section called “Blocking Changes”.

To carry out the record-only merge, get a working copy of the branch which was just reintegrated in revision X, and merge just revision X from trunk into your branch, making sure to use the --record-only option.

This merge uses the cherry-picking merge syntax, which was introduced in the section called “Cherrypicking”. Continuing with the running example from the section called “Reintegrating a Branch”, where revision X was revision 391:

$ cd my-calc-branch
$ svn update
Updating '.':
Updated to revision 393.
$ svn merge --record-only -c 391 ^/calc/trunk
--- Recording mergeinfo for merge of r391 into '.':
 U   .
$ svn commit -m "Block revision 391 from being merged into my-calc-branch."
Sending        .

Committed revision 394.

Now your branch is ready to soak up changes from the trunk again. After another sync of your branch to the trunk, you can even reintegrate the branch a second time. If necessary, you can do another record-only merge to keep the branch alive. Rinse and repeat.

It should now also be apparent why deleting the branch and re-creating it has the same effect as doing the above record-only merge. Because revision X is part of the natural history (see the sidebar Natural History and Implicit Mergeinfo) of the newly created branch, Subversion will never attempt to merge revision X into the branch, avoiding spurious conflicts.

Merge-Sensitive Logs and Annotations

One of the main features of any version control system is to keep track of who changed what, and when they did it. The svn log and svn blame subcommands are just the tools for this: when invoked on individual files, they show not only the history of changesets that affected the file, but also exactly which user wrote which line of code, and when she did it.

When changes start getting replicated between branches, however, things start to get complicated. For example, if you were to ask svn log about the history of your feature branch, it would show exactly every revision that ever affected the branch:

$ cd my-calc-branch
$ svn log -q
------------------------------------------------------------------------
r390 | user | 2002-11-22 11:01:57 -0600 (Fri, 22 Nov 2002)
------------------------------------------------------------------------
r388 | user | 2002-11-21 05:20:00 -0600 (Thu, 21 Nov 2002)
------------------------------------------------------------------------
r381 | user | 2002-11-20 15:07:06 -0600 (Wed, 20 Nov 2002)
------------------------------------------------------------------------
r359 | user | 2002-11-19 19:19:20 -0600 (Tue, 19 Nov 2002)
------------------------------------------------------------------------
r357 | user | 2002-11-15 14:29:52 -0600 (Fri, 15 Nov 2002)
------------------------------------------------------------------------
r343 | user | 2002-11-07 13:50:10 -0600 (Thu, 07 Nov 2002)
------------------------------------------------------------------------
r341 | user | 2002-11-03 07:17:16 -0600 (Sun, 03 Nov 2002)
------------------------------------------------------------------------
r303 | sally | 2002-10-29 21:14:35 -0600 (Tue, 29 Oct 2002)
------------------------------------------------------------------------
r98 | sally | 2002-02-22 15:35:29 -0600 (Fri, 22 Feb 2002)
------------------------------------------------------------------------

But is this really an accurate picture of all the changes that happened on the branch? What's left out here is the fact that revisions 390, 381, and 357 were actually the results of merging changes from the trunk. If you look at one of these logs in detail, the multiple trunk changesets that comprised the branch change are nowhere to be seen:

$ svn log -v -r 390
------------------------------------------------------------------------
r390 | user | 2002-11-22 11:01:57 -0600 (Fri, 22 Nov 2002) | 1 line
Changed paths:
   M /branches/my-calc-branch/button.c
   M /branches/my-calc-branch/README

Final merge of trunk changes to my-calc-branch.

We happen to know that this merge to the branch was nothing but a merge of trunk changes. How can we see those trunk changes as well? The answer is to use the --use-merge-history (-g) option. This option expands those “child” changes that were part of the merge.

$ svn log -v -r 390 -g
------------------------------------------------------------------------
r390 | user | 2002-11-22 11:01:57 -0600 (Fri, 22 Nov 2002) | 1 line
Changed paths:
   M /branches/my-calc-branch/button.c
   M /branches/my-calc-branch/README

Final merge of trunk changes to my-calc-branch.
------------------------------------------------------------------------
r383 | sally | 2002-11-21 03:19:00 -0600 (Thu, 21 Nov 2002) | 2 lines
Changed paths:
   M /branches/my-calc-branch/button.c
Merged via: r390

Fix inverse graphic error on button.
------------------------------------------------------------------------
r382 | sally | 2002-11-20 16:57:06 -0600 (Wed, 20 Nov 2002) | 2 lines
Changed paths:
   M /branches/my-calc-branch/README
Merged via: r390

Document my last fix in README.

By making the log operation use merge history, we see not just the revision we queried (r390), but also the two revisions that came along on the ride with it—a couple of changes made by Sally to the trunk. This is a much more complete picture of history!

The svn blame command also takes the --use-merge-history (-g) option. If this option is neglected, somebody looking at a line-by-line annotation of button.c may get the mistaken impression that you were responsible for the lines that fixed a certain error:

$ svn blame button.c
…
   390    user    retval = inverse_func(button, path);
   390    user    return retval;
   390    user    }
…

And while it's true that you did actually commit those three lines in revision 390, two of them were actually written by Sally back in revision 383:

$ svn blame button.c -g
…
G    383    sally   retval = inverse_func(button, path);
G    383    sally   return retval;
     390    user    }
…

Now we know who to really blame for those two lines of code!

Noticing or Ignoring Ancestry

When conversing with a Subversion developer, you might very likely hear reference to the term ancestry. This word is used to describe the relationship between two objects in a repository: if they're related to each other, one object is said to be an ancestor of the other.

For example, suppose you commit revision 100, which includes a change to a file foo.c. Then foo.c@99 is an “ancestor” of foo.c@100. On the other hand, suppose you commit the deletion of foo.c in revision 101, and then add a new file by the same name in revision 102. In this case, foo.c@99 and foo.c@102 may appear to be related (they have the same path), but in fact are completely different objects in the repository. They share no history or “ancestry.”

The reason for bringing this up is to point out an important difference between svn diff and svn merge. The former command ignores ancestry, while the latter command is quite sensitive to it. For example, if you asked svn diff to compare revisions 99 and 102 of foo.c, you would see line-based diffs; the diff command is blindly comparing two paths. But if you asked svn merge to compare the same two objects, it would notice that they're unrelated and first attempt to delete the old file, then add the new file; the output would indicate a deletion followed by an add:

D    foo.c
A    foo.c

Most merges involve comparing trees that are ancestrally related to one another; therefore, svn merge defaults to this behavior. Occasionally, however, you may want the merge command to compare two unrelated trees. For example, you may have imported two source-code trees representing different vendor releases of a software project (see the section called “Vendor Branches”). If you ask svn merge to compare the two trees, you'd see the entire first tree being deleted, followed by an add of the entire second tree! In these situations, you'll want svn merge to do a path-based comparison only, ignoring any relations between files and directories. Add the --ignore-ancestry option to your merge command, and it will behave just like svn diff. (And conversely, the --notice-ancestry option will cause svn diff to behave like the svn merge command.)

	Tip
	The `--ignore-ancestry` option also disables Merge Tracking. This means that `svn:mergeinfo` is not considered when svn merge is determining what revisions to merge, nor is `svn:mergeinfo` recorded to describe the merge.

Merges and Moves

A common desire is to refactor source code, especially in Java-based software projects. Files and directories are shuffled around and renamed, often causing great disruption to everyone working on the project. Sounds like a perfect case to use a branch, doesn't it? Just create a branch, shuffle things around, and then merge the branch back to the trunk, right?

Alas, this scenario doesn't work so well right now and is considered one of Subversion's current weak spots. The problem is that Subversion's svn update command isn't as robust as it should be, particularly when dealing with copy and move operations.

When you use svn copy to duplicate a file, the repository remembers where the new file came from, but it fails to transmit that information to the client which is running svn update or svn merge. Instead of telling the client, “Copy that file you already have to this new location,” it sends down an entirely new file. This can lead to problems, especially because the same thing happens with renamed files. A lesser-known fact about Subversion is that it lacks “true renames”—the svn move command is nothing more than an aggregation of svn copy and svn delete.

For example, suppose that while working on your private branch, you rename integer.c to whole.c. Effectively you've created a new file in your branch that is a copy of the original file, and deleted the original file. Meanwhile, back on trunk, Sally has committed some improvements to integer.c. Now you decide to merge your branch to the trunk:

$ cd calc/trunk

$ svn merge --reintegrate ^/calc/branches/my-calc-branch
--- Merging differences between repository URLs into '.':
D    integer.c
A    whole.c
U    .
--- Recording mergeinfo for merge between repository URLs into '.':
 U   .

This doesn't look so bad at first glance, but it's also probably not what you or Sally expected. The merge operation has deleted the latest version of the integer.c file (the one containing Sally's latest changes), and blindly added your new whole.c file—which is a duplicate of the older version of integer.c. The net effect is that merging your “rename” to the trunk has removed Sally's recent changes from the latest revision!

This isn't true data loss. Sally's changes are still in the repository's history, but it may not be immediately obvious that this has happened. The moral of this story is that until Subversion improves, be very careful about merging copies and renames from one branch to another.

Preventing Naïve Clients from Committing Merges

If you've just upgraded your server to Subversion 1.5 or later, there's a risk that pre-1.5 Subversion clients can cause problems with Merge Tracking. This is because pre-1.5 clients don't support this feature; when one of these older clients performs svn merge, it doesn't modify the value of the svn:mergeinfo property at all. So the subsequent commit, despite being the result of a merge, doesn't tell the repository about the duplicated changes—that information is lost. Later on, when “merge-aware” clients attempt automatic merging, they're likely to run into all sorts of conflicts resulting from repeated merges.

If you and your team are relying on the merge-tracking features of Subversion, you may want to configure your repository to prevent older clients from committing changes. The easy way to do this is by inspecting the “capabilities” parameter in the start-commit hook script. If the client reports itself as having mergeinfo capabilities, the hook script can allow the commit to start. If the client doesn't report that capability, have the hook deny the commit. Example 4.1, “Merge-tracking gatekeeper start-commit hook script” gives an example of such a hook script:

Example 4.1. Merge-tracking gatekeeper start-commit hook script

#!/usr/bin/env python
import sys

# The start-commit hook is invoked before a Subversion txn is created
# in the process of doing a commit.  Subversion runs this hook
# by invoking a program (script, executable, binary, etc.) named
# 'start-commit' (for which this file is a template)
# with the following ordered arguments:
#
#   [1] REPOS-PATH   (the path to this repository)
#   [2] USER         (the authenticated user attempting to commit)
#   [3] CAPABILITIES (a colon-separated list of capabilities reported
#                     by the client; see note below)

capabilities = sys.argv[3].split(':')
if "mergeinfo" not in capabilities:
  sys.stderr.write("Commits from merge-tracking-unaware clients are "
                   "not permitted.  Please upgrade to Subversion 1.5 "
                   "or newer.\n")
  sys.exit(1)
sys.exit(0)

For more information about hook scripts, see the section called “Implementing Repository Hooks”.

The Final Word on Merge Tracking

The bottom line is that Subversion's merge-tracking feature has an extremely complex internal implementation, and the svn:mergeinfo property is the only window the user has into the machinery.

Sometimes mergeinfo will appear on paths that you didn't expect to be touched by an operation. Sometimes mergeinfo won't be generated at all, when you expect it to. Furthermore, the management of mergeinfo metadata has a whole set of taxonomies and behaviors around it, such as “explicit” versus “implicit” mergeinfo, “operative” versus “inoperative” revisions, specific mechanisms of mergeinfo “elision,” and even “inheritance” from parent to child directories.

We've chosen to only briefly cover, if at all, these detailed topics for a couple of reasons. First, the level of detail is absolutely overwhelming for a typical user. Second, and more importantly, the typical user shouldn't have to understand these concepts; they should typically remain in the background as pesky implementation details. All that said, if you enjoy this sort of thing, you can get a fantastic overview in a paper posted at CollabNet's website (now mirrored on the Subversion website): https://subversion.apache.org/blog/2008-05-06-merge-info.html.

For now, if you want to steer clear of the complexities of merge tracking, we recommend that you follow these simple best practices:

For short-term feature branches, follow the simple procedure described throughout the section called “Basic Merging”.
Avoid subtree merges and subtree mergeinfo, perform merges only on the root of your branches, not on subdirectories or files (see the section called “Subtree Merges and Subtree Mergeinfo”) .
Don't ever edit the svn:mergeinfo property directly; use svn merge with the --record-only option to effect a desired change to the metadata (as demonstrated in the section called “Blocking Changes”).
Your merge target should be a working copy which represents the root of a complete tree representing a single location in the repository at a single point in time:
- Don't use the --allow-mixed-revisions option to merge into mixed-revision working copies.
- Don't merge to targets with “switched” subdirectories (as described next in the section called “Traversing Branches”).
- Avoid merges to targets with sparse directories. Likewise, don't merge to depths other than --depth=infinity
- Be sure you have read access to all of the merge source and read/write access to all of the merge target.

Prev	Up	Next
Basic Merging	Home	Traversing Branches