Coder Social home page Coder Social logo

go-diff's Introduction

go-diff GoDoc Build Status Coverage Status

go-diff offers algorithms to perform operations required for synchronizing plain text:

  • Compare two texts and return their differences.
  • Perform fuzzy matching of text.
  • Apply patches onto text.

Installation

go get -u github.com/sergi/go-diff/...

Usage

The following example compares two texts and writes out the differences to standard output.

package main

import (
	"fmt"

	"github.com/sergi/go-diff/diffmatchpatch"
)

const (
	text1 = "Lorem ipsum dolor."
	text2 = "Lorem dolor sit amet."
)

func main() {
	dmp := diffmatchpatch.New()

	diffs := dmp.DiffMain(text1, text2, false)

	fmt.Println(dmp.DiffPrettyText(diffs))
}

Found a bug or are you missing a feature in go-diff?

Please make sure to have the latest version of go-diff. If the problem still persists go through the open issues in the tracker first. If you cannot find your request just open up a new issue.

How to contribute?

You want to contribute to go-diff? GREAT! If you are here because of a bug you want to fix or a feature you want to add, you can just read on. Otherwise we have a list of open issues in the tracker. Just choose something you think you can work on and discuss your plans in the issue by commenting on it.

Please make sure that every behavioral change is accompanied by test cases. Additionally, every contribution must pass the lint and test Makefile targets which can be run using the following commands in the repository root directory.

make lint
make test

After your contribution passes these commands, create a PR and we will review your contribution.

Origins

go-diff is a Go language port of Neil Fraser's google-diff-match-patch code. His original code is available at http://code.google.com/p/google-diff-match-patch/.

Copyright and License

The original Google Diff, Match and Patch Library is licensed under the Apache License 2.0. The full terms of that license are included here in the APACHE-LICENSE-2.0 file.

Diff, Match and Patch Library

Written by Neil Fraser Copyright (c) 2006 Google Inc. http://code.google.com/p/google-diff-match-patch/

This Go version of Diff, Match and Patch Library is licensed under the MIT License (a.k.a. the Expat License) which is included here in the LICENSE file.

Go version of Diff, Match and Patch Library

Copyright (c) 2012-2016 The go-diff authors. All rights reserved. https://github.com/sergi/go-diff

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

go-diff's People

Contributors

akovaski avatar ardagnir avatar creachadair avatar dyoo avatar eclipseo avatar edwardbetts avatar gino4 avatar jba avatar josharian avatar kdarkhan avatar maksimov avatar nrnrk avatar osman-masood avatar r-pai avatar roryflynn avatar rwcarlsen avatar sergi avatar shatrugna avatar shawnps avatar sreekanth370 avatar torarvid avatar vmarkovtsev avatar zimmski avatar zmb3 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-diff's Issues

Go through all forks for additional changes

There are a lot of forks https://github.com/sergi/go-diff/network which could have additional interesting changes.

Look through all known forks and all their branches (not just master) for changes and try to contact the authors to incorporate them back.

Known forks:

lint.sh is broken on macOS 10.8+ due to a different version of grep

OSX have switched from GNU grep to BSD grep in version 10.8 and for this reason grep -P is no longer supported.

There is a workaround to install pcre grep, which will install GNU grep and -P will work again:
brew install grep --with-default-names

Another solution is to change grep -P to a relevant perl command, details can be found here.

runtime error: index out of range while processing diff

Hi,
I am not using this library directly, but via another package (go-git). I have found that in certain cases, there is panic on this line

text[i] = lineArray[r]

At the time of panic, the values are

1 iteration ago: i=179592 r=55295 len(chars)=190439 len(lineArray)=58902
next iteration: i=179595 r=65533 len(chars)=190439 len(lineArray)=58902

so clearly lineArray[r] seems to exceed the allocated memory. I still have to understand the code to understand why this happens, but any ideas would be appreciated. Interestingly, in both cases, the diff being processed when panic was unicode.

How to tell percent different

Thoughts on determining how different two files are? Say we compare file1.txt to file2.txt and I would like to see that they are 90% similar. Is that something I can determine with this current library or something that would need to be added? Thanks!

The documentation is lacking examples

go-diff is currently lacking examples for each exported function and method. This would not only help other users but make the whole project a lot better and complete.

A bug in DiffText1

With some texts, DiffText1 returns a wrong result. An example of such data is dmp.go of revision b94bf7 and b94bf7^. To see this, get text1.go (which just prints DiffText1) from https://gist.github.com/tkf/12bde871bf794e59bea88b659ed5b95b and run it as:

cd PATH/TO/go-diff/diffmatchpatch
diff <(go run PATH/TO/text1.go <(git show 'b94bf7:./dmp.go') <(git show 'b94bf7^:./dmp.go')) <(git show 'b94bf7:./dmp.go')

which pints

471c471
<

---
>                                       break
658c658
<

---
>

i.e., dmp.DiffText1(diffs) != text1.

The above gist also includes text1.bash which automates finding such examples. For example, I found many such examples by running:

git clone https://go.googlesource.com/go
cd go/src
PATH/TO/text1.bash **/*.go > examples

lack document of DiffDelete/DiffInsert.

From the document of DiffMatchPatch.DiffMain:

https://github.com/sergi/go-diff/blob/master/diffmatchpatch/diff.go#L49

// DiffMain finds the differences between two texts.
// If an invalid UTF-8 sequence is encountered, it will be replaced by the Unicode replacement character.
func (dmp *DiffMatchPatch) DiffMain(text1, text2 string, checklines bool) []Diff {

From the document of DiffDelete:

https://github.com/sergi/go-diff/blob/master/diffmatchpatch/diff.go#L30

	// DiffDelete item represents a delete diff.
	DiffDelete Operation = -1

So what is the mean of DiffDelete ?

Is DiffDelete mean that text1 do not have this content and text2 has this content? or text2 do not have this content and text1 has this content?

The user can do the experiment to find out that text1 is the old version ,and the text2 is the newer version, so DiffDelete mean that text2 do not have this content and text1 has this content.

But that information is not in the document.

DMP now on GitHub

I'm the maintainer of DMP and just stumbled across this project. After hosting us for ten years, the original repo at Google Code has shut down and the project has moved to https://github.com/google/diff-match-patch. You probably want to update the corresponding links on your project.

However, a bigger question is whether go-diff should be a separate project, or whether it should be incorporated into the main DMP project. It would be good to keep all versions in lock-step so that when bugs are found in one they are fixed across the board. What are your feelings regarding this?

A DiffLinesToChars then DiffMain bug

Here is the code:

package main

import (
    "github.com/sergi/go-diff/diff"
    "log"
)

func main() {
    sOld := "1\n2\n3\n4\n5\n6\n7\n3\n8\n9\n3\n10\n3\n11\n3\n12\n13\n14\n15\n12\n13\n16\n13\n13\n17\n18\n19\n20\n21\n22\n23\n24\n25\n26\n27\n28\n29\n30\n31\n32\n33\n34\n35\n12\n36\n37\n38\n39\n40\n41\n42\n13\n43\n44\n13\n45\n46\n47\n13\n13\n48\n49\n50\n51\n52\n13\n53\n54\n55\n56\n57\n58\n59\n60\n61\n62\n63\n64\n65\n66\n67\n68\n69\n13\n70\n71\n72\n73\n74\n13\n75\n13\n76\n77\n78\n79\n80\n81\n82\n83\n84\n85\n86\n87\n88\n89\n90\n67\n91\n92\n93\n81\n68\n13\n94\n71\n95\n96\n97\n98\n99\n100\n101\n102\n63\n103\n67\n104\n105\n13\n106\n107\n108\n109\n110\n111\n112\n113\n114\n115\n90\n116\n67\n13\n117\n72\n73\n74\n13\n75\n13\n76\n118\n119\n120\n78\n68\n121\n13\n122\n123\n124\n125\n93\n126\n68\n127\n13\n128\n129\n130\n131\n132\n133\n134\n135\n13\n136\n137\n138\n13\n78\n68\n13\n139\n140\n141\n142\n68\n13\n143\n144\n145\n146\n13\n147\n148\n13\n149\n150\n151\n152\n153\n150\n154\n13\n155\n156\n"
    sNew := "1\n2\n3\n4\n5\n6\n7\n3\n157\n9\n3\n10\n3\n11\n3\n12\n13\n14\n15\n12\n13\n16\n13\n13\n17\n18\n19\n20\n21\n22\n23\n24\n25\n26\n27\n28\n29\n30\n31\n32\n33\n34\n35\n12\n36\n37\n38\n39\n40\n41\n42\n13\n158\n159\n13\n45\n46\n47\n13\n13\n48\n49\n50\n51\n13\n53\n54\n55\n56\n57\n160\n59\n60\n61\n62\n63\n64\n161\n66\n67\n68\n69\n13\n70\n71\n72\n73\n74\n13\n75\n13\n162\n77\n78\n79\n80\n81\n82\n83\n84\n85\n86\n88\n89\n90\n67\n91\n92\n93\n81\n68\n13\n94\n71\n95\n96\n97\n98\n99\n100\n101\n102\n63\n103\n67\n104\n105\n13\n106\n107\n108\n109\n110\n111\n112\n113\n114\n115\n90\n116\n67\n13\n117\n72\n73\n74\n13\n75\n13\n163\n119\n120\n78\n68\n121\n13\n122\n123\n124\n125\n93\n126\n68\n127\n13\n128\n164\n130\n131\n132\n133\n134\n135\n13\n136\n137\n138\n13\n78\n68\n13\n139\n140\n165\n68\n13\n143\n144\n145\n146\n13\n147\n148\n13\n149\n150\n151\n166\n153\n150\n154\n13\n155\n156\n"
    dmp := diffmatchpatch.New()
    t1, t2, t := dmp.DiffLinesToChars(sOld, sNew)
    diffs := dmp.DiffMain(t1, t2, false)
    diffs = dmp.DiffCharsToLines(diffs, t)
    for _, diff := range diffs {
        log.Println(diff.Type, diff.Text)
    }
}

DiffLinesToChars seems to work OK, however, it panics "index out of range" at runtime.

I want to diff two texts line by line, just the line level, not word or char.

Is this a bug or I used the library incorrectly?

Thanks

DiffMain does not show deleted space

package main

import (
	"fmt"

	"github.com/sergi/go-diff/diffmatchpatch"
)

const (
	text1 = "package casec"
	text2 = "PackageCasec"
)

func main() {
	dmp := diffmatchpatch.New()

	diffs := dmp.DiffMain(text1, text2, false)

	fmt.Println(dmp.DiffPrettyText(diffs))
}

I expected the diff output results for above code to be one of pPackagecCasec or pPackage[x]cCasec, but it printed pPackage cCasec instead. ([x] indecates a space letter with red background)

So it was a little hard for me to recognize the space was deleted or not. I think DiffMain also should show the status of added or deleted space letters. I think it could be achieved using "space letter with a red or green background".

Refactor the whole code

dbcb93d started to refactor the code even more. There is a lot to do but I do not have the open OSS ours right now. This needs to be done in small iterations. This issue keeps track on what is already done.

Purpose:

  • Use the same code style everywhere, including naming things
  • The port still suffers from unidiomatic Go code
  • There are lot's of cases that can be made easier, e.g., early-returns

Unfinished functions:

  • min
  • max
  • indexOf
  • lastIndexOf
  • runesIndexOf
  • runesEqual
  • runesIndex
  • DiffMatchPatch.PatchAddContext
  • DiffMatchPatch.PatchMake
  • DiffMatchPatch.patchMake2
  • DiffMatchPatch.PatchDeepCopy
  • DiffMatchPatch.PatchApply
  • DiffMatchPatch.PatchAddPadding
  • DiffMatchPatch.PatchSplitMax
  • DiffMatchPatch.PatchToText
  • DiffMatchPatch.PatchFromText
  • DiffMatchPatch.MatchMain
  • DiffMatchPatch.MatchBitap
  • DiffMatchPatch.matchBitapScore
  • DiffMatchPatch.MatchAlphabet
  • New
  • splice
  • DiffMatchPatch.DiffMain
  • DiffMatchPatch.DiffMainRunes
  • DiffMatchPatch.diffMainRunes
  • DiffMatchPatch.diffCompute
  • DiffMatchPatch.diffLineMode
  • DiffMatchPatch.DiffBisect
  • DiffMatchPatch.diffBisect
  • DiffMatchPatch.diffBisectSplit
  • DiffMatchPatch.DiffLinesToChars
  • DiffMatchPatch.DiffLinesToRunes
  • DiffMatchPatch.diffLinesToRunes
  • DiffMatchPatch.diffLinesToRunesMunge
  • DiffMatchPatch.DiffCharsToLines
  • DiffMatchPatch.DiffCommonPrefix
  • DiffMatchPatch.DiffCommonSuffix
  • commonPrefixLength
  • commonSuffixLength
  • DiffMatchPatch.DiffCommonOverlap
  • DiffMatchPatch.DiffHalfMatch
  • DiffMatchPatch.diffHalfMatch
  • DiffMatchPatch.diffHalfMatchI
  • DiffMatchPatch.DiffCleanupSemantic
  • diffCleanupSemanticScore
  • DiffMatchPatch.DiffCleanupSemanticLossless
  • DiffMatchPatch.DiffCleanupEfficiency
  • DiffMatchPatch.DiffCleanupMerge
  • DiffMatchPatch.DiffXIndex
  • DiffMatchPatch.DiffPrettyHtml
  • DiffMatchPatch.DiffPrettyText
  • DiffMatchPatch.DiffText1
  • DiffMatchPatch.DiffText2
  • DiffMatchPatch.DiffLevenshtein
  • DiffMatchPatch.DiffToDelta
  • DiffMatchPatch.DiffFromDelta

Type for Diff.Text

The current type for Text is string:

type Diff struct {
    Type Operation
    Text string
}

When I call DiffMainRunes() though looks like the Text in each diff needs to be interpreted as a []rune. Is there a reason not to define that as []rune (or use multiple types if it's used differently in other APIs)? It was hard to interpret since Text is not printable in this case, and the documentation doesn't mention this.

Here's my code:

    a := "foo\nbar\nbaz"
    b := "foo\nbaz\nfooz\nbarrington"
    dmp := diffmatchpatch.New()
    r1, r2, f := dmp.DiffLinesToRunes(a, b)
    fmt.Println(f)
    fmt.Println(r1)
    fmt.Println(r2)
    s := dmp.DiffMainRunes(r1, r2, false)
    for _, d := range s {
        fmt.Println("d.Type:", d.Type)
        // Printing d.Text here without converting it produces an empty string
        fmt.Println("d.Text:", []rune(d.Text))
    }

Panic in PatchMake

Hitting a slice bounds out of range panic

panic: runtime error: slice bounds out of range

goroutine 47882 [running]:
panic(0xea1500, 0xc820034020)
    /usr/local/go1.6.3.src/src/runtime/panic.go:481 +0x3e6
..gojsondiff/vendor/github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).patchMake2(0xc8215e5c30, 0xc820f708c0, 0x1e, 0xc821594240, 0xa, 0x18, 0x0, 0x0, 0x0)
    ..gojsondiff/vendor/github.com/sergi/go-diff/diffmatchpatch/dmp.go:1810 +0x647
github.com/apcera/cntm-deps/gojsondiff/vendor/github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).PatchMake(0xc8215e5c30, 0xc8215e5bb8, 0x2, 0x2, 0x0, 0x0, 0x0)
    ..gojsondiff/vendor/github.com/sergi/go-diff/diffmatchpatch/dmp.go:1768 +0x486
.
.

Able to reproduce this by feeding (left=2016-09-01T03:07:14.807830741Z, right=2016-09-01T03:07:15.154800781Z) to PatchMake.

Following diff being applied to left string:[{0 2016-09-01T03:07:1} {1 5.15} *{0 .154} {-1 .} {0 80} {1 0} {0 78} {-1 3074} {0 1Z}]

while the diff being applied should have been [{0 2016-09-01T03:07:1} {1 5.15} *{0 4} {-1 .} {0 80} {1 0} {0 78} {-1 3074} {0 1Z}]

Thank you for your time.

Go 1.15: conversion from int to string yields a string of one rune, not a string of digits

Go 1.15 rc 1 on Fedora Rawhide

Testing    in: /builddir/build/BUILD/go-diff-1.1.0/_build/src
         PATH: /builddir/build/BUILD/go-diff-1.1.0/_build/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/sbin
       GOPATH: /builddir/build/BUILD/go-diff-1.1.0/_build:/usr/share/gocode
  GO111MODULE: off
      command: go test -buildmode pie -compiler gc -ldflags "-X github.com/sergi/go-diff/version=1.1.0 -extldflags '-Wl,-z,relro -Wl,--as-needed  -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld '"
      testing: github.com/sergi/go-diff
github.com/sergi/go-diff/diffmatchpatch
# github.com/sergi/go-diff/diffmatchpatch
./patch.go:327:18: conversion from int to string yields a string of one rune, not a string of digits (did you mean fmt.Sprint(x)?)
FAIL	github.com/sergi/go-diff/diffmatchpatch [build failed]

See golang/go#32479

Check for the exact error class in all tests

Checks like the following must be refactored

_, err = dmp.DiffFromDelta("", "+%c3%xy")
if err == nil {
	assert.Fail(t, "expected Invalid URL escape.")
}

The problem is that we only check that the error is not nil, but we do not check for the error class. This means that any error lets the test pass even though it is possible that it was the wrong error.

One way to test for the error class is to simply check if assert.Contains if a part of the error string can be found. Another one is to add special error types, and another one is to use one of many encapsulating error packages.

New line characters removed after applying patch

Code used:

package main

import (
	"fmt"

	"github.com/sergi/go-diff/diffmatchpatch"
)

func main() {

	var oldtext string = `foo
bar
`

	var patchtxt string = `@@ -1,2 +1,2 @@
-foo
+foobaz
 bar
`
	dmp := diffmatchpatch.New()
	patch, _ := dmp.PatchFromText(patchtxt)

	newtext, _ := dmp.PatchApply(patch, oldtext)
	fmt.Println("new text:", newtext)
}

Output:

new text: foobazbar

Expected Output:

new text: foobaz
bar

Is there a way to apply the patch to conserve the new line \n characters?

Change package name to "diff"

It is confusing and non-idiomatic to have the package name different from the last part of the import path. People will import "github.com/[whoever]/go-diff/diff" and will then try to use the package as "diff", but the name is secretly "diffmatchpatch". And Go package names are usually shorter.

lint.sh is broken on macOS due to cat -n behavior

I've been following the contributors' guide and I couldn't get past make lint command as it was giving an error on a vanilla source code. One of the issues is with testing the output of the likes of golint where it is expected to be empty. However echo -n "$OUT" printed out -n instead of nothing, which was then considered as an error condition.

I've tested a substitute command echo "$OUT\c" which does the same thing and works both on macOS and Linux. I will send a PR.

Refactor go-diff

Rework the repository code without breaking the API

  • Rename and move files around
  • Convert the existing tests to solely table-driven tests
  • Split the code to be more maintainable
  • Update the README to be more user and contributor friendly
  • Try to make the repository more contributor attractive
  • When done, ask for being reviewed and ping sergi that I am done here

Non-deterministic Behavior when Run in Multiple Goroutines

I'm seeing non-deterministic behavior when I run dmp in multiple goroutines. Basically, the "diffs" generated by DiffMain() should be identical no matter how many goroutines are run, but they differ. I'm going to try my best to see if I can find the cause, but you might have a more deeper understanding of what's going on. :-)

Here is the code (also attached as a file):

package main

import (
        "fmt"
        "os"
        "sync"
        "sync/atomic"

        "github.com/sergi/go-diff/diffmatchpatch"
)

const (
        expect = "[{1 licensed } {0 under the apache license, version 2.0 (the} {-1  #} {0 'license'); you may not use this file except in compliance } {-1 # } {0 with the license. you may obtain a copy of the license at } {-1 # # } {0 http://www.apache.org/licenses/license-2.0 } {-1 # # } {0 unless required by applicable law or agreed to in writing, } {-1 # } {0 software distributed under the license is distributed on an} {-1  #} {0 'as is'basis, without warranties or conditions of any} {-1  #} {0  kind, either express or implied. see the license for the } {-1 # } {0 specific language governing permissions and limitations} {-1  #} {0  under the license.}]"
        
        unknown = "under the apache license, version 2.0 (the #'license'); you may not use this file except in compliance # with the license. you may obtain a copy of the license at # # http://www.apache.org/licenses/license-2.0 # # unless required by applicable law or agreed to in writing, # software distributed under the license is distributed on an #'as is'basis, without warranties or conditions of any # kind, either express or implied. see the license for the # specific language governing permissions and limitations # under the license."
        
        known = "licensed under the apache license, version 2.0 (the'license'); you may not use this file except in compliance with the license. you may obtain a copy of the license at http://www.apache.org/licenses/license-2.0 unless required by applicable law or agreed to in writing, software distributed under the license is distributed on an'as is'basis, without warranties or conditions of any kind, either express or implied. see the license for the specific language governing permissions and limitations under the license."
)

var dmp = diffmatchpatch.New()

const num = 50

func main() {
        var matched, missed int32
        var wg sync.WaitGroup
        wg.Add(num)
        for i := 0; i < num; i++ {
                go func(i int) {
                        defer wg.Done()
                        diffs := dmp.DiffMain(unknown, known, false)
                        s := fmt.Sprintf("%v", diffs)
                        if s != expect {
                                fmt.Fprintf(os.Stderr, "MISMATCH(%d):\n%s\n", i, s)
                                atomic.AddInt32(&missed, 1)
                        } else {
                                atomic.AddInt32(&matched, 1)
                        }
                }(i)
        }
        wg.Wait()
        fmt.Fprintf(os.Stderr, "NUMBER MATCHING: %d\n", matched)
        fmt.Fprintf(os.Stderr, "NUMBER MISMATCHING: %d\n", missed)
}

d.go.txt

Add a default DiffMatchPatch object

Most people do not need their own instance of DiffMatchPatch since they do not change the default values. So let's make all the exported methods available by adding a default DiffMatchPatch instance and exporting functions using that instance.

@sergi What do you think?

What is the checklines parameter?

The docs for DiffMain and DiffMainRunes don't explain what the checklines parameter is used for.

Digging through the source, it looks like this has an impact on how the diff is calculated, if the input is large enough. But I'm unclear the pros and cons for choosing this.

Happy to make a PR to update the documentation if someone can advise on the use?

Is it possible to prefer single line changes instead of multi-line?

When diffing JSON, we have some json like:

{
    "del": "^2.2.0",
    "es6-symbol": "^3.1.1",
    "eslint": "^4.11.0",
    "eslint-config-enough": "0.2.5",
}

If we compare to

{
    "del": "^2.2.0",
    "eslint": "^4.11.0",
    "eslint-config-enough": "0.2.5",
}

We end up with:

image

It would be nice to be able to prefer full-line or full word changes instead of single-character changes.

If you change the word "hello" to "goodbye", you see the diff as:

[{Delete hell} {Insert g} {Equal o} {Insert odbye}]

Which is much harder for a human to read than if the lib had the ability to prefer word/line boundaries and showed:
[{Delete hello} {Insert goodbye}]

DiffPrettyText in windows cmd prints garbage

I had tested your library. It works very well. However I see that basic functionality means "DiffPrettyText" works well in color console, but in windows command line is producing garbage.

See screenshots.
image
image

diffutils replacement

Hi there,
thank you for an awesome project.
Is there recommended set of functions to replace command line tool - diff and diffstat?
Something, which gives output like this:

diff -u test1 test2 | diffstat -s
1 file changed, 3 insertions(+), 3 deletions(-)

As I am playing with dmp.DiffMain(test1, test2, true) and try to pass entire files as string, I get very different output, than I expect:

[{-1 very} {1 magic,} {0  } {-1 awesome} {1 in} {0  } {-1 tes} {1 fac} {0 t
with} {1  I} {0
} {-1 o} {1 be} {0 l} {-1 d and n} {1 iev} {0 e} {-1 w data} {0
}]

The test files look like this:
test1:

very awesome test
with
old and new data

test2:

magic, in fact
with I
believe

Thanks in advance!

why is the body of the patch excaped with %xx notation?

Is it specification of the GNU diff?
I want get simple diffs like below.

@@ -24,21 +24,22 @@
 ve.

-[Not] eat to eat.
+[Not] live to eat.

But in fact, I'll get below.

@@ -24,21 +24,22 @@
 ve.%0A
-%5BNot%5D eat to eat.
+%5BNot%5D live to eat.

Wrong patch is generated with DiffTimeout = 0

I think the patch generated by the following code is wrong:

package main

import (
    "io"
    "os"

    "github.com/sergi/go-diff/diffmatchpatch"
)

const (
    text1 = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus ut risus et enim consectetur convallis a non ipsum. Sed nec nibh cursus, interdum libero vel."
    text2 = "Lorem a ipsum dolor sit amet, consectetur adipiscing elit. Vivamus ut risus et enim consectetur convallis a non ipsum. Sed nec nibh cursus, interdum liberovel."
)

func main() {
    dmp := diffmatchpatch.New()
    dmp.DiffTimeout = 0
    // dmp.DiffTimeout = time.Hour  // it works
    diffs := dmp.DiffMain(text1, text2, true)
    diffs = dmp.DiffCleanupSemantic(diffs)
    patches := dmp.PatchMake(text1, diffs)
    io.WriteString(os.Stdout, dmp.PatchToText(patches))
}

It generates:

@@ -1,17 +1,18 @@
 Lorem
+a
 ro
-
 um dolor

However, the correct patch would be:

@@ -1,14 +1,16 @@
 Lorem
+a
 ipsum do
@@ -148,13 +148,12 @@
 m libero
-
 vel.

which is generated by setting dmp.DiffTimeout = time.Hour. Note that C++ implementation (with timeout=0) also generates the latter patch.

Rethink the repository title

The current repository title just says "Port of Google's diff-match-patch library to Go" which does not say anything about the functionality of the repository, if you do not know the original. I therefore propose changing the text to something like "Diff, match and patch text in Go".

Exporting Patch fields

Is it possible to expose the Patch struct fields? It would be useful to have the start1 and start2 fields to feed to Match functions.

What is expected behavior when input is not valid utf8?

I tried looking at the docs for this package, but the DiffMain method simply says:

DiffMain finds the differences between two texts.

So I'm not sure how it's supposed to handle input that contains invalid utf8 sequences.

Here's how it handles it right now:

package main

import (
    "fmt"
    "unicode/utf8"

    "github.com/sergi/go-diff/diffmatchpatch"
)

func main() {
    var inputs = []string{
        "a1234567890z",
        "Hello 世界",
        "a\xe0\xe5\xf0\xe9\xe1\xf8\xf1\xe9\xe8\xe4Z",
    }

    for _, in := range inputs {
        fmt.Printf("input: %q\n(length %v bytes)\nutf8.Valid: %v\n", in, len(in), utf8.ValidString(in))

        dmp := diffmatchpatch.New()
        diffs := dmp.DiffMain(in, "", true)

        fmt.Printf("diff text: %q\n(length %v bytes)\n\n", diffs[0].Text, len(diffs[0].Text))
    }
}

Output:

input: "a1234567890z"
(length 12 bytes)
utf8.Valid: true
diff text: "a1234567890z"
(length 12 bytes)

input: "Hello 世界"
(length 12 bytes)
utf8.Valid: true
diff text: "Hello 世界"
(length 12 bytes)

input: "a\xe0\xe5\xf0\xe9\xe1\xf8\xf1\xe9\xe8\xe4Z"
(length 12 bytes)
utf8.Valid: false
diff text: "a����������Z"
(length 32 bytes)

In the case where input is not valid utf8, the length of output, in bytes, is not the same as input (12 bytes vs. 32 bytes).

Is that expected behavior?

If so, is there a way I can use diffmatchpatch in such a way that it gives me a diff on a byte-level, meaning the length of output, in bytes, should match that of input (aside from pre-processing the input to not contain invalid utf8 sequences)?

Index out of range panic in DiffCharsToLines on large JSON diff

I've encountered this issue while using https://github.com/src-d/go-git, but the bug is easily reproducible with the code snippet below and the JSON file in attachment.

package main

import (
	"fmt"
	"io/ioutil"
	"os"

	"github.com/sergi/go-diff/diffmatchpatch"
)

func main() {
	f, err := os.Open("data.txt")
	defer f.Close()
	checkErr(err)
	data, err := ioutil.ReadAll(f)
	checkErr(err)

	// from https://github.com/src-d/go-git/blob/v4.0.0/utils/diff/diff.go#L17
	dmp := diffmatchpatch.New()
	wSrc, wDst, warray := dmp.DiffLinesToChars(string(data), "")
	diffs := dmp.DiffMain(wSrc, wDst, false)
	diffs = dmp.DiffCharsToLines(diffs, warray)
	fmt.Println(diffs)
}

func checkErr(err error) {
	if err != nil {
		panic(err)
	}
}

Output:

$ go run main.go
panic: runtime error: index out of range

goroutine 1 [running]:
github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).DiffCharsToLines(0xc420044ee8, 0xc420078390, 0x1, 0x2, 0xc4202ce000, 0xd802, 0xec00, 0x1, 0x2, 0xc4202ce000)
	/Users/krylovsk/src/github.com/sergi/go-diff/diffmatchpatch/diff.go:414 +0x394
main.main()
	/tmp/go-diff-debug/main.go:24 +0x29e
exit status 2

data.txt

Converting between string and []rune takes a long time

Consider the program below. The program runs slowly. Part of it is due to encoding and decoding between strings and runes. Roughly 1860ms out of 4050ms is spent doing this:

(pprof) top
4050ms of 9270ms total (43.69%)
Dropped 87 nodes (cum <= 46.35ms)
Showing top 10 nodes out of 138 (cum >= 580ms)
      flat  flat%   sum%        cum   cum%
     630ms  6.80%  6.80%      630ms  6.80%  runtime.encoderune
     600ms  6.47% 13.27%     2250ms 24.27%  github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).diffBisect
     560ms  6.04% 19.31%     1230ms 13.27%  runtime.slicerunetostring
     400ms  4.31% 23.62%      500ms  5.39%  runtime.semrelease
     360ms  3.88% 27.51%     1120ms 12.08%  runtime.pcvalue
     310ms  3.34% 30.85%      310ms  3.34%  runtime.readvarint
     310ms  3.34% 34.20%      620ms  6.69%  runtime.step
     300ms  3.24% 37.43%      300ms  3.24%  github.com/sergi/go-diff/diffmatchpatch.runesEqual
     300ms  3.24% 40.67%      410ms  4.42%  runtime.lock
     280ms  3.02% 43.69%      580ms  6.26%  github.com/sergi/go-diff/diffmatchpatch.runesIndex

It should be possible to work on one representation during the algorithm to avoid this overhead.

(Note: This has non-deterministic behavior that was reported in #75.)

package main

import (
        "flag"
        "fmt"
        "log"
        "os"
        "runtime/pprof"
        "sync"
        "sync/atomic"

        "github.com/sergi/go-diff/diffmatchpatch"
)

var (
        cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
        dmp        = diffmatchpatch.New()
)

const (
        num    = 50000
        expect = "[{1 licensed } {0 under the apache license, version 2.0 (the} {-1  #} {0 'license'); you may not use this file except in compliance } {-1 # } {0 with the license. you may obtain a copy of the license at } {-1 # # } {0 http://www.apache.org/licenses/license-2.0 } {-1 # # } {0 unless required by applicable law or agreed to in writing, } {-1 # } {0 software distributed under the license is distributed on an} {-1  #} {0 'as is'basis, without warranties or conditions of any} {-1  #} {0  kind, either express or implied. see the license for the } {-1 # } {0 specific language governing permissions and limitations} {-1  #} {0  under the license.}]"

        unknown = "under the apache license, version 2.0 (the #'license'); you may not use this file except in compliance # with the license. you may obtain a copy of the license at # # http://www.apache.org/licenses/license-2.0 # # unless required by applicable law or agreed to in writing, # software distributed under the license is distributed on an #'as is'basis, without warranties or conditions of any # kind, either express or implied. see the license for the # specific language governing permissions and limitations # under the license."

        known = "licensed under the apache license, version 2.0 (the'license'); you may not use this file except in compliance with the license. you may obtain a copy of the license at http://www.apache.org/licenses/license-2.0 unless required by applicable law or agreed to in writing, software distributed under the license is distributed on an'as is'basis, without warranties or conditions of any kind, either express or implied. see the license for the specific language governing permissions and limitations under the license."
)

func main() {
        flag.Parse()
        if *cpuprofile != "" {
                f, err := os.Create(*cpuprofile)
                if err != nil {
                        log.Fatal(err)
                }
                pprof.StartCPUProfile(f)
                defer pprof.StopCPUProfile()
        }

        var matched, missed int32
        var wg sync.WaitGroup
        wg.Add(num)
        for i := 0; i < num; i++ {
                go func(i int) {
                        defer wg.Done()
                        diffs := dmp.DiffMain(unknown, known, false)
                        s := fmt.Sprintf("%v", diffs)
                        if s != expect {
                                //fmt.Fprintf(os.Stderr, "MISMATCH(%d):\n%s\n", i, s)
                                atomic.AddInt32(&missed, 1)
                        } else {
                                atomic.AddInt32(&matched, 1)
                        }
                }(i)
        }
        wg.Wait()
        fmt.Fprintf(os.Stderr, "NUMBER MATCHING: %d\n", matched)
        fmt.Fprintf(os.Stderr, "NUMBER MISMATCHING: %d\n", missed)
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.