Spaces:
Running
Running
Rerere | |
====== | |
This document describes the rerere logic. | |
Conflict normalization | |
---------------------- | |
To ensure recorded conflict resolutions can be looked up in the rerere | |
database, even when branches are merged in a different order, | |
different branches are merged that result in the same conflict, or | |
when different conflict style settings are used, rerere normalizes the | |
conflicts before writing them to the rerere database. | |
Different conflict styles and branch names are normalized by stripping | |
the labels from the conflict markers, and removing the common ancestor | |
version from the `diff3` or `zdiff3` conflict styles. Branches that | |
are merged in different order are normalized by sorting the conflict | |
hunks. More on each of those steps in the following sections. | |
Once these two normalization operations are applied, a conflict ID is | |
calculated based on the normalized conflict, which is later used by | |
rerere to look up the conflict in the rerere database. | |
Removing the common ancestor version | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
Say we have three branches AB, AC and AC2. The common ancestor of | |
these branches has a file with a line containing the string "A" (for | |
brevity this is called "line A" in the rest of the document). In | |
branch AB this line is changed to "B", in AC, this line is changed to | |
"C", and branch AC2 is forked off of AC, after the line was changed to | |
"C". | |
Forking a branch ABAC off of branch AB and then merging AC into it, we | |
get a conflict like the following: | |
<<<<<<< HEAD | |
B | |
======= | |
C | |
>>>>>>> AC | |
Doing the analogous with AC2 (forking a branch ABAC2 off of branch AB | |
and then merging branch AC2 into it), using the diff3 or zdiff3 | |
conflict style, we get a conflict like the following: | |
<<<<<<< HEAD | |
B | |
||||||| merged common ancestors | |
A | |
======= | |
C | |
>>>>>>> AC2 | |
By resolving this conflict, to leave line D, the user declares: | |
After examining what branches AB and AC did, I believe that making | |
line A into line D is the best thing to do that is compatible with | |
what AB and AC wanted to do. | |
As branch AC2 refers to the same commit as AC, the above implies that | |
this is also compatible what AB and AC2 wanted to do. | |
By extension, this means that rerere should recognize that the above | |
conflicts are the same. To do this, the labels on the conflict | |
markers are stripped, and the common ancestor version is removed. The above | |
examples would both result in the following normalized conflict: | |
<<<<<<< | |
B | |
======= | |
C | |
>>>>>>> | |
Sorting hunks | |
~~~~~~~~~~~~~ | |
As before, lets imagine that a common ancestor had a file with line A | |
its early part, and line X in its late part. And then four branches | |
are forked that do these things: | |
- AB: changes A to B | |
- AC: changes A to C | |
- XY: changes X to Y | |
- XZ: changes X to Z | |
Now, forking a branch ABAC off of branch AB and then merging AC into | |
it, and forking a branch ACAB off of branch AC and then merging AB | |
into it, would yield the conflict in a different order. The former | |
would say "A became B or C, what now?" while the latter would say "A | |
became C or B, what now?" | |
As a reminder, the act of merging AC into ABAC and resolving the | |
conflict to leave line D means that the user declares: | |
After examining what branches AB and AC did, I believe that | |
making line A into line D is the best thing to do that is | |
compatible with what AB and AC wanted to do. | |
So the conflict we would see when merging AB into ACAB should be | |
resolved the same way--it is the resolution that is in line with that | |
declaration. | |
Imagine that similarly previously a branch XYXZ was forked from XY, | |
and XZ was merged into it, and resolved "X became Y or Z" into "X | |
became W". | |
Now, if a branch ABXY was forked from AB and then merged XY, then ABXY | |
would have line B in its early part and line Y in its later part. | |
Such a merge would be quite clean. We can construct 4 combinations | |
using these four branches ((AB, AC) x (XY, XZ)). | |
Merging ABXY and ACXZ would make "an early A became B or C, a late X | |
became Y or Z" conflict, while merging ACXY and ABXZ would make "an | |
early A became C or B, a late X became Y or Z". We can see there are | |
4 combinations of ("B or C", "C or B") x ("X or Y", "Y or X"). | |
By sorting, the conflict is given its canonical name, namely, "an | |
early part became B or C, a late part became X or Y", and whenever | |
any of these four patterns appear, and we can get to the same conflict | |
and resolution that we saw earlier. | |
Without the sorting, we'd have to somehow find a previous resolution | |
from combinatorial explosion. | |
Conflict ID calculation | |
~~~~~~~~~~~~~~~~~~~~~~~ | |
Once the conflict normalization is done, the conflict ID is calculated | |
as the sha1 hash of the conflict hunks appended to each other, | |
separated by <NUL> characters. The conflict markers are stripped out | |
before the sha1 is calculated. So in the example above, where we | |
merge branch AC which changes line A to line C, into branch AB, which | |
changes line A to line C, the conflict ID would be | |
SHA1('B<NUL>C<NUL>'). | |
If there are multiple conflicts in one file, the sha1 is calculated | |
the same way with all hunks appended to each other, in the order in | |
which they appear in the file, separated by a <NUL> character. | |
Nested conflicts | |
~~~~~~~~~~~~~~~~ | |
Nested conflicts are handled very similarly to "simple" conflicts. | |
Similar to simple conflicts, the conflict is first normalized by | |
stripping the labels from conflict markers, stripping the common ancestor | |
version, and the sorting the conflict hunks, both for the outer and the | |
inner conflict. This is done recursively, so any number of nested | |
conflicts can be handled. | |
Note that this only works for conflict markers that "cleanly nest". If | |
there are any unmatched conflict markers, rerere will fail to handle | |
the conflict and record a conflict resolution. | |
The only difference is in how the conflict ID is calculated. For the | |
inner conflict, the conflict markers themselves are not stripped out | |
before calculating the sha1. | |
Say we have the following conflict for example: | |
<<<<<<< HEAD | |
1 | |
======= | |
<<<<<<< HEAD | |
3 | |
======= | |
2 | |
>>>>>>> branch-2 | |
>>>>>>> branch-3~ | |
After stripping out the labels of the conflict markers, and sorting | |
the hunks, the conflict would look as follows: | |
<<<<<<< | |
1 | |
======= | |
<<<<<<< | |
2 | |
======= | |
3 | |
>>>>>>> | |
>>>>>>> | |
and finally the conflict ID would be calculated as: | |
`sha1('1<NUL><<<<<<<\n3\n=======\n2\n>>>>>>><NUL>')` | |