pkg / diff Goto Github PK
View Code? Open in Web Editor NEWLicense: BSD 3-Clause "New" or "Revised" License
License: BSD 3-Clause "New" or "Revised" License
I used them for a project long ago, as a way to easily have portable and readable "edit scripts" to go from plaintext file A to B. You can see it in action via diff --ed
.
This is a kind of diff that's unlike the others, since it's not really meant for humans. I still think it belongs here, though.
I initially read WriteToA
as "write to A", not as "write A to ...".
Assuming #1 is done, we could rewrite this to:
type Writeable interface {
LeftWriteTo(w io.Writer, i int) (int, error)
RightWriteTo(w io.Writer, i int) (int, error)
LeftFilename() string
RightFilename() string
}
I think it's unnecessary to explicitly say filenames, even if 99% of the use cases will involve files on disk.
For example, one might want to diff streams, or objects in memory, or whatever else. These might still have very valid names, such as "before-optimisation" and "after-optimisation".
Consider this diff:
@@ -441,8 +441,26 @@ func (v *ReferenceNameIterator) Next() (string, error) {
return C.GoString(ptr), nil
}
-// Next retrieves the next reference. If the iterationis over, the
-// returned error is git.ErrIterOver
+// Slice returns a slice of all reference names.
+// It is a convenience function similar to calling Next repeatedly.
+// If Slice encounters an error, it returns all reference names up to that error.
+func (v *ReferenceNameIterator) Slice() ([]string, error) {
+ // TODO: implement directly in terms of git_reference_next_name?
+ var all []string
+ for {
+ s, err := v.Next()
+ if IsErrorCode(err, ErrIterOver) {
+ return all, nil
+ }
+ if err != nil {
+ return all, err
+ }
+ all = append(all, s)
+ }
+}
+
+// Next retrieves the next reference. If the iteration is over, the
+// returned error is git.ErrIterOver.
func (v *ReferenceIterator) Next() (*Reference, error) {
var ptr *C.git_reference
This diff would be greatly improved if it looked like this:
@@ -441,8 +441,26 @@ func (v *ReferenceNameIterator) Next() (string, error) {
return C.GoString(ptr), nil
}
+// Slice returns a slice of all reference names.
+// It is a convenience function similar to calling Next repeatedly.
+// If Slice encounters an error, it returns all reference names up to that error.
+func (v *ReferenceNameIterator) Slice() ([]string, error) {
+ // TODO: implement directly in terms of git_reference_next_name?
+ var all []string
+ for {
+ s, err := v.Next()
+ if IsErrorCode(err, ErrIterOver) {
+ return all, nil
+ }
+ if err != nil {
+ return all, err
+ }
+ all = append(all, s)
+ }
+}
+
-// Next retrieves the next reference. If the iterationis over, the
-// returned error is git.ErrIterOver
+// Next retrieves the next reference. If the iteration is over, the
+// returned error is git.ErrIterOver.
func (v *ReferenceIterator) Next() (*Reference, error) {
var ptr *C.git_reference
Not only it is clearer, it is likely to be far more helpful if you want to stage/patch part but not all of such a change.
Every diff algorithm generates the original diff (at least among diff algorithms implemented in git).
We could add an optional post-processing step that attempts to interleave additions and deletions based on the similarity of the content.
The README says that we're waiting for generics. This issue can serve as a place to discuss that, as the generic proposals roll in.
Unfortunately, the latest generic proposal (summer 2019) doesn't play nicely with the diff package. Quoting myself from some private correspondence with Ian:
The Myer's diff implementation requires input of the form
(a, b []T)
, where some elements of a will be compared for equality with the elements of b. The most common T are string, []byte, and []rune, but I do also want it to be more fully generic. What should the contract for T be?
It can't just be contracts.comparable, because, among other reasons, []byte doesn't support ==.
It's a bit awkward for it to be
T Equal(T) bool
, because then you need to use a named type for strings just to provide an implementation of Equals, which is the sort of "you've just moved around the boilerplate" problem you mention with interfaces. (This is also reminiscent of the comment on the main thread about intermingling methods and basic types, and the consequent need for type switches that work on contracts.) And you also have to convert from[]string
to[]namedString
(or[][]byte
to[]namedByteSlice
) just to do the diff, which is annoying and O(n) instead of O(1).
...
After more pondering, the best solution I've come up with using the current proposal is to accept an equality function and use generic adapters for those. Here's an example, typed straight into email (so use a lenient parser). It uses SameSlice rather than Diff, because it exhibits the same challenges and is much simpler to implement and understand.
func (type T) SameSlice(a, b []T, eq func(T, T) bool) bool {
if len(a) != len(b) {
return false
}
for i := range a {
if !eq(a[i], b[i]) {
return false
}
}
return true
}
func (type T comparable) builtinEqual(x, y T) bool {
return x == y
}
Call sites look like:
a := []string{ ... some strings ... }
b := []string{ ... more strings ... }
same := SameSlice(a, b, builtinEqual(string))
Or for
[][]byte
arguments:
same := SameSlice(a, b, bytes.Equal)
And you could write a generic adapter to go from types with an Equal method to an eq function.
So at least for the moment, I'm going to press forward with a non-generic version. We can re-evaluate if/when the next generics proposal draft arrives.
$ go version
go version devel +dec3d00b28 Thu Mar 25 04:21:29 2021 +0000 linux/amd64
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/myitcv/.cache/go-build"
GOENV="/home/myitcv/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/myitcv/gostuff/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/myitcv/gostuff"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/myitcv/gos"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/myitcv/gos/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="devel +dec3d00b28 Thu Mar 25 04:21:29 2021 +0000"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/tmp/tmp.uHtLe5z1Zd/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build267893791=/tmp/go-build -gno-record-gcc-switches"
To reproduce:
cd $(mktemp -d)
curl -sL https://github.com/pkg/diff/files/6221280/repro.zip -o repro.zip
unzip repro.zip
go run main.go
Above, repro.zip
is a reference to: repro.zip
I understand that it's just used in a test, and that if a main package imports pkg/diff, it won't end up depending on sergi/go-diff at a package level.
However, it's still undesirable for pkg/diff to pull in such relatively heavy dependencies. Beyond go-diff itself, we also end up pulling testify and even a yaml library (!!!).
It's true that this will become less of a concern with lazy module loading. I don't think that's a satisfactory answer though, because it won't ship until at least 1.17 in six months, and people will keep using go 1.16
or earlier in their go.mod
files for at least a year or two.
So, I think we should remove it. It seems to only be used in a single test, to show a diff when the unified diff tests fail. I get why we don't want to use our own code; if the diffs are broken, showing a broken diff in the output isn't good.
I see two reasonable options:
Use the host's diff
tool in that case, like what gofmt -d
does.
Just show the entire output as-is. The tests aren't large, and failures should be rare.
All of its other methods don't use pointer receivers, and this particular one never writes to any field, so it seems unnecessary to make this inconsistent.
I think this will be a common need for user-facing tools; people generally expect diffs to be readable without having to look at the +
or -
prefixes of the unified diff.
I think we can begin with a TerminalColor
that always uses the red/green colors. If anyone wants to control when this is used, e.g. via isatty
, they can do that in the calling code. This way, if anyone requests other color formats such as HTML, we aren't using the confusing Color
option name.
I've written this on my own code that uses pkg/diff
, so I'm happy to send a PR if this seems like a good idea.
As I was reading the TODO: better name
on DiffWriteable
, I thought to myself - is there a good reason to separate the interfaces?
Unifying them would fix the naming issue, and make the simple use of the API simpler.
On the other hand, there's something I don't like about a "diffable" interface concerning io.Writer
.
I was just thinking that most of the types one might want to diff can already be compared for equality, as per the language spec.
So, we could add func Comparable(a, b interface{}) DiffWriteable
. It would ensure that both params are of the same type, that both are slice types, and that both element types are comparable.
For example, this would be useful with integers, arrays, custom struct types, et cetera.
I'd still keep Strings
, because for the foreseeable future, avoiding interface parameters will make the code faster and easier.
Potentially, we could also use the Equal
method as defined in https://github.com/google/go-cmp.
The secondary consideration for cutting a v1.0 of this package is deciding about output compatibility.
Given exactly the same inputs, do we guarantee the same outputs over time?
Upsides to output compatibility:
Downsides to output compatibility:
I'm very torn. Opinions, @mvdan? (Or anyone else?)
Should we do terminal detection automatically? See e.g. tailscale/depaware#11.
It would add significant dependencies, which is unfortunate. Perhaps we should instead provide easily copy/paste-able code that does it? Or maybe with lazy module loading in 1.17 it is enough to put terminal detection in its own package?
Are the names A
and B
common in diff algorithms or APIs? I would personally find Left
and Right
more intuitive. Diffs are very commonly displayed like this - some use <
and >
to display "removed" and "added", lots of code review UIs do side-by-side reviews, et cetera.
The names are also short enough to be spelled out in parameter names, avoiding the confusion of r
with io.Reader
.
need to check what impact this has on go modules
We have sort.Interface
in the standard library. Why go for diff.Diffable
instead of diff.Interface
? The former seems to stutter a bit.
On the other hand, we already have two interfaces here, while package sort
only has the one.
I originally wrote this package because I wanted access to the guts of a diff, not just a printed representation of it. I hid the API for it for initial release.
I now find myself wanting access again. For the moment, I've locally exported some stuff as-is. See fae18f5 for details.
Question: Should we expose these nuts and bolts as-is? Should we provide accessors to (say) walk an edit script and emit a series of insertions, deletions, and equalities? (WriteUnified could be implemented using such a walk function.)
Opinions, @mvdan ?
That is, func Files(left, right string) DiffWriteable
. It would read the contents of the files up-front and diff the lines as []string
or [][]byte
, while including the original filenames. It would also do things that are common when one diffs plaintext files, such as treating windows line endings like unix line endings, or ignoring a trailing newline unless it appears/disappears.
In the rare case that someone needs this to support streaming, they can just easily implement the interface themselves.
I have been thinking about making a push to get this to v1.0. The main problem is the API. The current API is unwieldy: it is a blend of high level helpers and low level data structures, and doing anything useful with it requires a half dozen lines.
I propose to break this up into multiple packages.
The top level package (github.com/pkg/diff) will contain at most a few functions that do the most commonly requested operations, such as read two files and emit a unified diff. It may also contain a bunch of options that can be passed to those functions. The implementation will be mostly glue.
The subpackages will contain lower level data structures, algorithms, and helpers. These can be used by people who want fine-grained control over everything or who want to do something unusual (like me).
Here's a very rough, incomplete sketch of packages and exported funcs/types/vars, to give a flavor:
Files
Readers
Script
Range
Op
Diff
Pair
Options
Color
Names
WriterToA etc
Write
Write
Regexp
LSP
GNU
Opinions on this general direction, @mvdan?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.