kth / spork Goto Github PK

AST-based structured merge tool for Java, fully Git compatible https://doi.org/10.1109/TSE.2022.3143766

License: MIT License

Java 59.28% Shell 1.08% Kotlin 39.65%

spork's Introduction

Spork - AST based merging for Java files

Spork is an AST based structured merge tool for Java. This means that instead of merging Java files by their raw text, it resolves the abstract syntax trees and merges based on them instead.

Academic use

If you use Spork in an academic context, please cite the related research paper:

S. Larsen, J. -R. Falleri, B. Baudry and M. Monperrus, "Spork: Structured Merge for Java with Formatting Preservation," in IEEE Transactions on Software Engineering, doi: 10.1109/TSE.2022.3143766.

You may export a citation in various formats using the Cite this repository button to the right. See the GitHub docs for more info.

Master's thesis

Spork was created as part of a master's thesis project. If you want to learn more about Spork in terms of theory and design, the thesis is freely available.

Spork: Move-enabled structured merge for Java with GumTree and 3DM

Attribution

Spork is built on top of a few pieces of fantastic software, most notably:

Spoon provides the AST representation used in Spork.
GumTree provides the tree matching.
gumtree-spoon-ast-diff bridges the gap between Spoon and GumTree.

The merge implementation in Spork is based on the 3DM merge algorithm by Tancred Lindholm.

Quickstart

Want to just try out Spork on a small merge scenario? Below are a few shell commands that will download Spork along with a sample merge scenario, and then run it!

# Download Spork
wget https://github.com/KTH/spork/releases/download/v0.5.0/spork-0.5.0.jar -O spork.jar

# Download a sample merge scenario
wget https://raw.githubusercontent.com/KTH/spork/fe906f537d1bb7205256d1fe81fda9f323849a60/src/test/resources/clean/both_modified/move_if/Left.java
wget https://raw.githubusercontent.com/KTH/spork/fe906f537d1bb7205256d1fe81fda9f323849a60/src/test/resources/clean/both_modified/move_if/Base.java
wget https://raw.githubusercontent.com/KTH/spork/fe906f537d1bb7205256d1fe81fda9f323849a60/src/test/resources/clean/both_modified/move_if/Right.java
# You should now have spork.jar, Left.java, Base.java and Right.java in your local directory

# a line based-merge is not possible
diff3 Left.java Base.java Right.java -m -A

# an AST-merge with Spork does
java -jar spork.jar Left.java Base.java Right.java

It should print the result of the merge to stdout. See Base.java for the original version, and Left.java and Right.java for the respective changes. They should all be neatly integrated into the resulting merge. For more on using Spork, see Usage.

Usage

You can find a pre-built jar-file under relases. The jar-file includes all dependencies, so all you need is a Java runtime. Download the jar and run it like so:

Important: Spork requires a Java runtime version 8 or higher to run.

java -jar path/to/spork/jar <left> <base> <right>

The left, base and right arguments are required, and represent the left, base and right revisions, respectively. The base revision should be the best common ancestor of the left and right revisions.

For a full listing of the command line options, supply the -h option. It will produce the following output.

Usage: spork [-eghlV] [-o=<out>] LEFT BASE RIGHT
The Spork command line app.
      LEFT              Path to the left revision.
      BASE              Path to the base revision.
      RIGHT             Path to the right revision.
  -e, --exit-on-error   Disable line-based fallback if the structured merge
                          encounters an error.
  -g, --git-mode        Enable Git compatibility mode. Required to use Spork as
                          a Git merge driver.
  -h, --help            Show this help message and exit.
  -l, --logging         Enable logging output.
  -o, --output=<out>    Path to the output file. Existing files are overwritten.
  -V, --version         Print version information and exit.

Naturally, if you want the absolute latest version, you will have to build Spork yourself.

Build

Maven can be used to build the latest version of Spork.

Note: Requires JDK8+ to build.

mvn clean compile package -DskipTests

This will produce a jar-file in the target directory called something along the lines of spork-x.x.x.jar. Run the jar with java -jar path/to/spork/jar.

Configure as a Git merge driver

When Git performs a merge and encounters a file that has been edited in both revisions under merge, it will invoke a merge driver to merge the conflicting versions. It's a very simple thing to configure Spork as a merge driver for Java files, all you need is to add a couple of lines to a couple of configuration files. First, let's create a .gitattributes file and specify to use Spork as a merge driver for Java files. Put the following content in your .gitattributes file (you may all ready have one, check your home directory):

*.java merge=spork

spork doesn't mean anything to Git yet, we need to actually define the merge driver called spork. We do that in the .gitconfig file, typically located in your home directory. You should put the following content into it:

[core]
	attributesfile = /path/to/.gitattributes

[merge "spork"]
    name = spork
    driver = java -jar /path/to/spork.jar merge --git-mode %A %O %B -o %A

Then replace /path/to/.gitattributes with the absolute path to the .gitattributes file you edited/created first, and replace /path/to/spork.jar with the absolute path to the Spork jar-file. With that done, Spork will be used as the merge driver for Java files!

Important: The --git-mode option is required to use Spork as a Git merge driver. If you find that Spork always reverts to line-based merge, then that option is probably missing in the driver option that invokes Spork.

License

Unless otherwise stated, files in Spork are under the MIT license.

The GumTreeSpoonAstDiff file is composed of code from gumtree-spoon-ast-diff and is therefore individually licensed under Apache License 2.0.

Experimental: compile as a native image

Spork can be compiled to a native executable file using GraalVM. To do so, you need to first install GraalVM's JDK for Java 17 (Spork does not support Java 21 yet, see #479). Running mvn package -P native will then generate a native image in target/spork.

Note: the native image is known to behave differently to the .jar file, producing different conflict resolution results. Help to debug this would be welcome.

spork's People

Contributors

Stargazers

Watchers

Forkers

slarse quentinledilavrec adrianaul fanjiangchenghe wetneb kjy5

spork's Issues

Resolve ordering conflicts

The basic implementation of 3DM reports ordering conflicts as actual conflicts. These should be resolved automatically, and it shouldn't be all that difficult. There are only a few syntactical levels in Java where nodes are actually unordered.

Successors and predecessors lookup tables are extremely unbalanced by null

Due to all PCS lists starting and ending with null, and most PCS lists being pretty short, there's an extreme unbalance in the successors and predecessors lookup tables in the TStar class. For a large merge, this is significant, as there can be tens of thousands of nodes, of which a linear amount get put in the same lookup table bucket (indexed by null).

And this is without accounting for hash collisions. Needless to say, this must be solved. I checked with a profiler and roughly 80% of the total runtime of a large merge is spent looking up elements.

Performance could probably be further improved by not using HashSets for the buckets (this is mostly for convenience when building the lookup tables), but to start with it would be great not to put more than half of the nodes into the same bucket.

Instead of using null for the start-of-list and end-of-list elements, I could use some objects that contains the root element.

Move conflicts are not handled

Root conflicts are not properly handled and will cause the same node to be inserted in multiple places. This will, quite obviously, mess some stuff up. I think however that it's fine to ignore root conflicts, as there's really no way to visualize them textually.

Anyway, for the PcsInterpreter to work, every SpoonNode in the PCS structure must only appear as a predecessor and successor precisely once. This means that if there are root conflicts, the conflicting nodes must be copied such the uniqueness of nodes is preserved.

Cannot handle annotations with contents

For example, if there is a SuppressWarnings("unchecked") in the source code. Spork crashes when trying to rebuild the Spoon tree. The crash occurs when setting the value by role, as Spoon expects a Map but is given a literal.

https://github.com/KTH/spork/blob/c44bf4cf6ab43039fee61e00c0a67218b35da91a/src/main/java/se/kth/spork/merge/spoon/SpoonPcs.java#L211

Structural conflict handling doesn't quite work

It currently operates on the principle that structural conflicts can only occur in places where an arbitrary amount of code elements can be added, and simply adds the conflicting elements in order, so they are all present in the tree. This is not actually possible in all cases, for example a binary operator may give rise to a structural conflict (in either left or right operand), but there's no way to just add two left or right operands.

A more general solution is needed.

Handle files consisting of elements other than a single class

I realized that issues #3, #11 and #12 are essentially the same problem as far as implementation goes, so this issue concerns handling all files that something other (or something in addition to) a single class. This includes:

interfaces
enums
nested types
multiple non-nested types

This will have to be a two-step fix, first back end support and then front end (i.e. CLI) support.

Original source position of unordered elements messes up pretty printing

If the source position of a member is set, the pretty printer will use that when printing, and not respect the new ordering. This can be fixed by setting the source position to SourcePosition.NOPOSITION.

mem.setValueByRole(CtRole.POSITION, SourcePosition.NOPOSITION);

Add Travis CI config

Pretty printer sometimes picks the wrong active package

Spoon sometimes resolves packages that have no associated source files, or perhaps it is due to the merge not properly handling when a package is changed.

A temporary fix (until I can narrow down the issue), would be to search for packages that contain other packages or types when trying to find the active package.

Back end support for non-single-class files

See #13

Handle content updates of nodes

Currently, only the content of literal values is actually considered when resolving the content of a node. The relevant parts of Spork is the content resolution before 3DM merge starts, as well as the application of a node's content after the merge has been produced.

Right now, the content picked will almost always be the base revision's content, because of how the content conflict resolution works, and most content values simply being the short string representation of the nodes.

The content is resolved here (before merge):

https://github.com/KTH/spork/blob/c44bf4cf6ab43039fee61e00c0a67218b35da91a/src/main/java/se/kth/spork/merge/spoon/Spoon3dmMerge.java#L72-L98

And applied here (after merge):

https://github.com/KTH/spork/blob/c44bf4cf6ab43039fee61e00c0a67218b35da91a/src/main/java/se/kth/spork/merge/spoon/SpoonPcs.java#L221-L241

There is some terminology confusion with the copyTree method as well, it doesn't just straight copy a node, but rather creates the merged node based on the node that's passed, and the contents mapping.

Refactor content handling to not have a notion of primary/secondary values

There's really no point to this primary/secondary value distinction, all of the values are equally important. It also just makes it harder to merge.

Refactor test suite to automatically discover test directories

Instead of having to list them in the test files. Can be done by implementing the ArgumentsProvider interface.

Directories prefixed with IGNORE will be ignored.

add support for coverage tracking (codcov or coveralls)

Pretty printer prints implicit values

This took me a while to figure out, but the reason the pretty printer prints e.g. java.lang.String instead of just String is that ignoreImplicit is true by default. It's a bit convoluted, but ignoreImplicit in the pretty printer means "ignore that a value is implicit and should be ignored, and print it anyway". printImplicit would have made more sense...

Anyway, gotta set that value to false. This issue will remind me if I ever wonder why I did that.

Resolution of ordering conflicts is probably too liberal

Right now, all type members are considered unordered. This isn't actually the case as fields may have dependencies on other fields, and so their order does matter.

Really, an ordering conflict can only be trivially automatically resolved if at most one of the conflicting node lists contains fields. If both contain fields, one has to analyze the dependencies between these fields.

It will probably work out quite often in practice regardless, but it would be better if fields were treated with more care.

Pre-process the merged tree before pretty-printing

The logic required for pretty-printing properly (e.g. unsetting the source position of comments, embedding conflicts into string values, etc) is spread out through the entire application. This could be consolidated into a single scan of the tree before pretty-printing. It will make pretty-printing itself significantly slower, but in regards to the entire merge it should add very little overhead as it's an O(n) iteration over all tree nodes.

Define "type alias" for Content<SpoonNode, RoledValue>

Just to reduce the visual overload of seeing that thing written all over. It could be as simple as a wrapper class called SpoonContent wich is simply befined as public final class SpoonContent extends Content<SpoonNode, RoledValue> {}.

Naively augmenting the base/left+base/right matchings with left/right matchings causes strange behavior

The idea with allowing nodes in left and right that are not matched to the base revision to be matched with each other is to allow for copy/pasted code to be detected. However, in the current setup, this can lead to really strange behavior. For example, if both left and right add a print statement in a fairly close proximity, those print statements may be matched together, which makes just about no sense. This sometimes results in (fixable) crashes.

Perhaps, the matching between left and right should be limited to trees of some minimum size to prevent these kinds of unintended matchings.

Content conflicts on non-string values are not handled properly

As of #42 nodes can have both primary and secondary values. Currently, if there is an unresolved conflict in the secondary values, the primary value is flagged as conflicting in the output. For example, the followng merge scenario has an obvious conflict in the visibility modifier:

// base
int x = 2;

// left
private int x = 2;

// right
public int x = 2;

But it will actually be flagged like so:

int
<<<<<<< LEFT
x
=======
x
>>>>>>> RIGHT
= 2;

Which obviously is horrendously incorrect. This needs to be resolved, but it would probably be best to identify some more secondary values first to make sure it's resolved in a generalized fashion.

Handle structural conflicts

Related to #4

This might be a bit tricky.

Cannot handle files with multiple/nested classes

Spork gets really confused by multiple classes in the same source file. The current implementation is more or less built around the assumption that a single source file will contain one class, which is not necessarily true and must be fixed. One notable problem area is when finalizing the PCS triple and setting the fake root:

https://github.com/KTH/spork/blob/c44bf4cf6ab43039fee61e00c0a67218b35da91a/src/main/java/se/kth/spork/merge/spoon/PcsBuilder.java#L49-L51

Separate merge and compare into distinct commands

This is to make experiments easier to perform.

Usage like this:

$ spork compare <left> <right>

This also means that the merge command needs to be a separate command.

$ spork merge <left> <base> <right>

Document existing merge tools and clarify what Spork does differently

Why do we even need Spork? See #7 for context.

Front end drops package statements in pretty-print

If a file has a package statement, the back end will correctly merge it, but the front end will not print it.

Modifiers are dropped

Modifiers should reasonably be part of the content of a node. They're currently just dropped. This means that a node's content isn't limited to a single thing, as was previously assumed. I need to rethink how to handle content and content conflicts.

Refactor the project

It's starting to get a bit messy. In particular, the division into subpackages almost seems arbitrary at this point, and classes such as Spoon3dmMerge and SpoonPcs do way too many things.

Import statements are dropped entirely

The import statements don't even make it to the back end. While I could add import statements in separate PCS lists, it seems easier to just merge them separately. Imports are kind of unrelated to the rest of the syntactic structure of a Java file.

Refactor TdmMerge class to not modify contents in-place

This is required for a nice resolution to #36

Handle interfaces

Need to use a Launcher object instead of the static Launcher.parseClass

Spoon fails to mark @SuppressWarnings package as implicit if named element is used

I've no idea why this happens, but if you write this:

@SuppressWarnings(value="unchecked")

then Spoon will not mark java.lang as implicit, and the output will be:

@java.lang.SuppressWarnings(value="unchecked")

Took me an hour of debugging to figure out ... To be clear, the problem is the use of a named element, if you remove the value=, it the annotation type's package gets properly marked as implicit.

Handle conflicts

Conflicts should be handled somehow. By handle, I don't mean resolve, but simply represent in the source output with the standard <<<<<<<<, ======== and >>>>>>> delimiters.

Resolve method ordering conflicts

See #5

Structural conflicts are detected twice

As structurally conflicting PCS triples are not removed from the set, each structural conflict will be detected twice. Triples that are already part of a conflict should not be considered again for structural conflicts.

Perhaps they should be considered for content conflicts, however, as contents are associated with the predecessor of the triple, and not with the triple itself.

Have the merge command optionally write to a file

If an output option is supplied, write to a file instead of stdout.

Bounds on wildcards are not handled correctly

They seem to most often become extends. But sometimes they become super. For some reason.

Handle content conflicts

Related to #4

Handle conflicts of node contents.

Front end support for non-single-class files

See #13

Need an unordered equality method for comparing trees

Spoon will differentiate trees that have differently ordered methods, but I don't want that. Before comparing trees in tests, their type members should be sorted such that fields preserve their order in respect to each other, but methods are ordered by signature (lexicographically).

Comments are not handled correctly

Or... at all? I'm not sure what I'm even doing about comments. But it's certainly not being handled correctly.

Compare ASTs using gumtree-spoon

To determine of two Java file's ASTs are equal, don't use the built-in equals, but rather:

Sort unordered elements.
Compare import statements precisely.
Compare ASTs with gumtree-spoon-ast-diff, ignoring comments

Operator values aren't resolved

The content of an operator is the kind (accessed with getKind()), which needs to be resolved. Otherwise, the actual operator in the resolved merge is not always what's expected.

Use logger instead of printing to stdout

Generalize algorithm for merging secondary values upon conflict

In general, it should be possible to merge secondary values according to the following algorithm (assume that (v,r) is a value v with role r)

If a tuple (v, r) is present in both left and right, keep it.
If a tuple (v, r) is in left and a tuple (v', r) is in right, then if either is present in base, discard that one. If neither is present in base, conflict!

This can be further elaborated upon to deal with values that are collections.

That should pretty much do it. This may lead to an algorithm in which there is no need to distinguish between primary and secondary values of a node, as that's a distinction I made kind of haphazardly to be able to tag on more values to a node.

Note that this situation will only occur if left and right have differing primary and/or secondary values, and neither left nor right is an exact match to base.

Class comments are dropped by Spoon when reading from file system

I'm not sure if this is something I'm doing wrong, or if Spoon is doing it wrong (I think the latter), but adding an input resource from the file system results in the class comment being dropped. Not always, but often. For example, with this file:

/**
 * This is a comment!
 */
public class Test {}

Then this drops the class comment:

launcher.addInputResource(new FileSystemFile(pathToFile));

but this captures the class comment:

launcher.addInputResource(new VirtualFile(readFile(pathToFile)));

TODO: Work around for now, report on Spoon issue tracker if the latest build doesn't resolve it.

Actually utilize predecessor conflicts as end-of-conflict markers

Currently, Spork removes predecessor conflicts (i.e. triples on the form Pcs(a, b, c) and Pcs(a, b', c)) before processing the PCS structure. If I'm not entirely mistaken, leaving them in would greatly simplify extracting the conflicting root nodes, so I think they should be left in the PCS structure.