kpdecker / jsdiff Goto Github PK

View Code? Open in Web Editor NEW

7.7K 95.0 492.0 1.98 MB

A javascript text differencing implementation.

License: BSD 3-Clause "New" or "Revised" License

JavaScript 100.00%

jsdiff's Introduction

jsdiff

A JavaScript text differencing implementation. Try it out in the online demo.

Based on the algorithm proposed in "An O(ND) Difference Algorithm and its Variations" (Myers, 1986).

Installation

npm install diff --save

Usage

Broadly, jsdiff's diff functions all take an old text and a new text and perform three steps:

Split both texts into arrays of "tokens". What constitutes a token varies; in diffChars, each character is a token, while in diffLines, each line is a token.
Find the smallest set of single-token insertions and deletions needed to transform the first array of tokens into the second.

This step depends upon having some notion of a token from the old array being "equal" to one from the new array, and this notion of equality affects the results. Usually two tokens are equal if === considers them equal, but some of the diff functions use an alternative notion of equality or have options to configure it. For instance, by default diffChars("Foo", "FOOD") will require two deletions (o, o) and three insertions (O, O, D), but diffChars("Foo", "FOOD", {ignoreCase: true}) will require just one insertion (of a D), since ignoreCase causes o and O to be considered equal.
Return an array representing the transformation computed in the previous step as a series of change objects. The array is ordered from the start of the input to the end, and each change object represents inserting one or more tokens, deleting one or more tokens, or keeping one or more tokens.

API

Diff.diffChars(oldStr, newStr[, options]) - diffs two blocks of text, treating each character as a token.

("Characters" here means Unicode code points - the elements you get when you loop over a string with a for ... of ... loop.)

Returns a list of change objects.

Options
- ignoreCase: If true, the uppercase and lowercase forms of a character are considered equal. Defaults to false.
Diff.diffWords(oldStr, newStr[, options]) - diffs two blocks of text, treating each word and each punctuation mark as a token. Whitespace is ignored when computing the diff (but preserved as far as possible in the final change objects).

Returns a list of change objects.

Options
- ignoreCase: Same as in diffChars. Defaults to false.
Diff.diffWordsWithSpace(oldStr, newStr[, options]) - diffs two blocks of text, treating each word, punctuation mark, newline, or run of (non-newline) whitespace as a token.
Diff.diffLines(oldStr, newStr[, options]) - diffs two blocks of text, treating each line as a token.

Options
- ignoreWhitespace: true to ignore leading and trailing whitespace characters when checking if two lines are equal. Defaults to false.
- ignoreNewlineAtEof: true to ignore a missing newline character at the end of the last line when comparing it to other lines. (By default, the line 'b\n' in text 'a\nb\nc' is not considered equal to the line 'b' in text 'a\nb'; this option makes them be considered equal.) Ignored if ignoreWhitespace or newlineIsToken are also true.
- stripTrailingCr: true to remove all trailing CR (\r) characters before performing the diff. Defaults to false. This helps to get a useful diff when diffing UNIX text files against Windows text files.
- newlineIsToken: true to treat the newline character at the end of each line as its own token. This allows for changes to the newline structure to occur independently of the line content and to be treated as such. In general this is the more human friendly form of diffLines; the default behavior with this option turned off is better suited for patches and other computer friendly output. Defaults to false.
Note that while using ignoreWhitespace in combination with newlineIsToken is not an error, results may not be as expected. With ignoreWhitespace: true and newlineIsToken: false, changing a completely empty line to contain some spaces is treated as a non-change, but with ignoreWhitespace: true and newlineIsToken: true, it is treated as an insertion. This is because the content of a completely blank line is not a token at all in newlineIsToken mode.

Returns a list of change objects.
Diff.diffSentences(oldStr, newStr[, options]) - diffs two blocks of text, treating each sentence as a token.

Returns a list of change objects.
Diff.diffCss(oldStr, newStr[, options]) - diffs two blocks of text, comparing CSS tokens.

Returns a list of change objects.
Diff.diffJson(oldObj, newObj[, options]) - diffs two JSON-serializable objects by first serializing them to prettily-formatted JSON and then treating each line of the JSON as a token. Object properties are ordered alphabetically in the serialized JSON, so the order of properties in the objects being compared doesn't affect the result.

Returns a list of change objects.

Options
- stringifyReplacer: A custom replacer function. Operates similarly to the replacer parameter to JSON.stringify(), but must be a function.
- undefinedReplacement: A value to replace undefined with. Ignored if a stringifyReplacer is provided.
Diff.diffArrays(oldArr, newArr[, options]) - diffs two arrays of tokens, comparing each item for strict equality (===).

Options
- comparator: function(left, right) for custom equality checks
Returns a list of change objects.
Diff.createTwoFilesPatch(oldFileName, newFileName, oldStr, newStr[, oldHeader[, newHeader[, options]]]) - creates a unified diff patch by first computing a diff with diffLines and then serializing it to unified diff format.

Parameters:
- oldFileName : String to be output in the filename section of the patch for the removals
- newFileName : String to be output in the filename section of the patch for the additions
- oldStr : Original string value
- newStr : New string value
- oldHeader : Optional additional information to include in the old file header. Default: undefined.
- newHeader : Optional additional information to include in the new file header. Default: undefined.
- options : An object with options.
  - context describes how many lines of context should be included. You can set this to Number.MAX_SAFE_INTEGER or Infinity to include the entire file content in one hunk.
  - ignoreWhitespace: Same as in diffLines. Defaults to false.
  - stripTrailingCr: Same as in diffLines. Defaults to false.
  - newlineIsToken: Same as in diffLines. Defaults to false.
Diff.createPatch(fileName, oldStr, newStr[, oldHeader[, newHeader[, options]]]) - creates a unified diff patch.

Just like Diff.createTwoFilesPatch, but with oldFileName being equal to newFileName.
Diff.formatPatch(patch) - creates a unified diff patch.

patch may be either a single structured patch object (as returned by structuredPatch) or an array of them (as returned by parsePatch).
Diff.structuredPatch(oldFileName, newFileName, oldStr, newStr[, oldHeader[, newHeader[, options]]]) - returns an object with an array of hunk objects.

This method is similar to createTwoFilesPatch, but returns a data structure suitable for further processing. Parameters are the same as createTwoFilesPatch. The data structure returned may look like this:
```
{
  oldFileName: 'oldfile', newFileName: 'newfile',
  oldHeader: 'header1', newHeader: 'header2',
  hunks: [{
    oldStart: 1, oldLines: 3, newStart: 1, newLines: 3,
    lines: [' line2', ' line3', '-line4', '+line5', '\\ No newline at end of file'],
  }]
}
```
Diff.applyPatch(source, patch[, options]) - attempts to apply a unified diff patch.

If the patch was applied successfully, returns a string containing the patched text. If the patch could not be applied (because some hunks in the patch couldn't be fitted to the text in source), returns false.

patch may be a string diff or the output from the parsePatch or structuredPatch methods.

The optional options object may have the following keys:
- fuzzFactor: Number of lines that are allowed to differ before rejecting a patch. Defaults to 0.
- autoConvertLineEndings: If true, and if the file to be patched consistently uses different line endings to the patch (i.e. either the file always uses Unix line endings while the patch uses Windows ones, or vice versa), then applyPatch will behave as if the line endings in the patch were the same as those in the source file. (If false, the patch will usually fail to apply in such circumstances since lines deleted in the patch won't be considered to match those in the source file.) Defaults to true.
- compareLine(lineNumber, line, operation, patchContent): Callback used to compare to given lines to determine if they should be considered equal when patching. Defaults to strict equality but may be overridden to provide fuzzier comparison. Should return false if the lines should be rejected.
Diff.applyPatches(patch, options) - applies one or more patches.

patch may be either an array of structured patch objects, or a string representing a patch in unified diff format (which may patch one or more files).

This method will iterate over the contents of the patch and apply to data provided through callbacks. The general flow for each patch index is:
- options.loadFile(index, callback) is called. The caller should then load the contents of the file and then pass that to the callback(err, data) callback. Passing an err will terminate further patch execution.
- options.patched(index, content, callback) is called once the patch has been applied. content will be the return value from applyPatch. When it's ready, the caller should call callback(err) callback. Passing an err will terminate further patch execution.
Once all patches have been applied or an error occurs, the options.complete(err) callback is made.
Diff.parsePatch(diffStr) - Parses a patch into structured data

Return a JSON object representation of the a patch, suitable for use with the applyPatch method. This parses to the same structure returned by Diff.structuredPatch.
Diff.reversePatch(patch) - Returns a new structured patch which when applied will undo the original patch.

patch may be either a single structured patch object (as returned by structuredPatch) or an array of them (as returned by parsePatch).
Diff.convertChangesToXML(changes) - converts a list of change objects to a serialized XML format
Diff.convertChangesToDMP(changes) - converts a list of change objects to the format returned by Google's diff-match-patch library

Universal `options`

Certain options can be provided in the options object of any method that calculates a diff (including diffChars, diffLines etc. as well as structuredPatch, createPatch, and createTwoFilesPatch):

callback: if provided, the diff will be computed in async mode to avoid blocking the event loop while the diff is calculated. The value of the callback option should be a function and will be passed the computed diff or patch as its first argument.

(Note that if the ONLY option you want to provide is a callback, you can pass the callback function directly as the options parameter instead of passing an object with a callback property.)
maxEditLength: a number specifying the maximum edit distance to consider between the old and new texts. You can use this to limit the computational cost of diffing large, very different texts by giving up early if the cost will be huge. This option can be passed either to diffing functions (diffLines, diffChars, etc) or to patch-creation function (structuredPatch, createPatch, etc), all of which will indicate that the max edit length was reached by returning undefined instead of whatever they'd normally return.
timeout: a number of milliseconds after which the diffing algorithm will abort and return undefined. Supported by the same functions as maxEditLength.
oneChangePerToken: if true, the array of change objects returned will contain one change object per token (e.g. one per line if calling diffLines), instead of runs of consecutive tokens that are all added / all removed / all conserved being combined into a single change object.

Defining custom diffing behaviors

If you need behavior a little different to what any of the text diffing functions above offer, you can roll your own by customizing both the tokenization behavior used and the notion of equality used to determine if two tokens are equal.

The simplest way to customize tokenization behavior is to simply tokenize the texts you want to diff yourself, with your own code, then pass the arrays of tokens to diffArrays. For instance, if you wanted a semantically-aware diff of some code, you could try tokenizing it using a parser specific to the programming language the code is in, then passing the arrays of tokens to diffArrays.

To customize the notion of token equality used, use the comparator option to diffArrays.

For even more customisation of the diffing behavior, you can create a new Diff.Diff() object, overwrite its castInput, tokenize, removeEmpty, equals, and join properties with your own functions, then call its diff(oldString, newString[, options]) method. The methods you can overwrite are used as follows:

castInput(value, options): used to transform the oldString and newString before any other steps in the diffing algorithm happen. For instance, diffJson uses castInput to serialize the objects being diffed to JSON. Defaults to a no-op.
tokenize(value, options): used to convert each of oldString and newString (after they've gone through castInput) to an array of tokens. Defaults to returning value.split('') (returning an array of individual characters).
removeEmpty(array): called on the arrays of tokens returned by tokenize and can be used to modify them. Defaults to stripping out falsey tokens, such as empty strings. diffArrays overrides this to simply return the array, which means that falsey values like empty strings can be handled like any other token by diffArrays.
equals(left, right, options): called to determine if two tokens (one from the old string, one from the new string) should be considered equal. Defaults to comparing them with ===.
join(tokens): gets called with an array of consecutive tokens that have either all been added, all been removed, or are all common. Needs to join them into a single value that can be used as the value property of the change object for these tokens. Defaults to simply returning tokens.join('').
postProcess(changeObjects): gets called at the end of the algorithm with the change objects produced, and can do final cleanups on them. Defaults to simply returning changeObjects unchanged.

Change Objects

Many of the methods above return change objects. These objects consist of the following fields:

value: The concatenated content of all the tokens represented by this change object - i.e. generally the text that is either added, deleted, or common, as a single string. In cases where tokens are considered common but are non-identical (e.g. because an option like ignoreCase or a custom comparator was used), the value from the new string will be provided here.
added: true if the value was inserted into the new string, otherwise false
removed: true if the value was removed from the old string, otherwise false
count: How many tokens (e.g. chars for diffChars, lines for diffLines) the value in the change object consists of

(Change objects where added and removed are both false represent content that is common to the old and new strings.)

Examples

Basic example in Node

require('colors');
const Diff = require('diff');

const one = 'beep boop';
const other = 'beep boob blah';

const diff = Diff.diffChars(one, other);

diff.forEach((part) => {
  // green for additions, red for deletions
  let text = part.added ? part.value.bgGreen :
             part.removed ? part.value.bgRed :
                            part.value;
  process.stderr.write(text);
});

console.log();

Running the above program should yield

Basic example in a web page

<pre id="display"></pre>
<script src="diff.js"></script>
<script>
const one = 'beep boop',
    other = 'beep boob blah',
    color = '';
    
let span = null;

const diff = Diff.diffChars(one, other),
    display = document.getElementById('display'),
    fragment = document.createDocumentFragment();

diff.forEach((part) => {
  // green for additions, red for deletions
  // grey for common parts
  const color = part.added ? 'green' :
    part.removed ? 'red' : 'grey';
  span = document.createElement('span');
  span.style.color = color;
  span.appendChild(document
    .createTextNode(part.value));
  fragment.appendChild(span);
});

display.appendChild(fragment);
</script>

Open the above .html file in a browser and you should see

Example of generating a patch from Node

The code below is roughly equivalent to the Unix command diff -u file1.txt file2.txt > mydiff.patch:

const Diff = require('diff');
const file1Contents = fs.readFileSync("file1.txt").toString();
const file2Contents = fs.readFileSync("file2.txt").toString();
const patch = Diff.createTwoFilesPatch("file1.txt", "file2.txt", file1Contents, file2Contents);
fs.writeFileSync("mydiff.patch", patch);

Examples of parsing and applying a patch from Node

Applying a patch to a specified file

The code below is roughly equivalent to the Unix command patch file1.txt mydiff.patch:

const Diff = require('diff');
const file1Contents = fs.readFileSync("file1.txt").toString();
const patch = fs.readFileSync("mydiff.patch").toString();
const patchedFile = Diff.applyPatch(file1Contents, patch);
fs.writeFileSync("file1.txt", patchedFile);

Applying a multi-file patch to the files specified by the patch file itself

The code below is roughly equivalent to the Unix command patch < mydiff.patch:

const Diff = require('diff');
const patch = fs.readFileSync("mydiff.patch").toString();
Diff.applyPatches(patch, {
    loadFile: (patch, callback) => {
        let fileContents;
        try {
            fileContents = fs.readFileSync(patch.oldFileName).toString();
        } catch (e) {
            callback(`No such file: ${patch.oldFileName}`);
            return;
        }
        callback(undefined, fileContents);
    },
    patched: (patch, patchedContent, callback) => {
        if (patchedContent === false) {
            callback(`Failed to apply patch to ${patch.oldFileName}`)
            return;
        }
        fs.writeFileSync(patch.oldFileName, patchedContent);
        callback();
    },
    complete: (err) => {
        if (err) {
            console.log("Failed with error:", err);
        }
    }
});

Compatibility

jsdiff supports all ES3 environments with some known issues on IE8 and below. Under these browsers some diff algorithms such as word diff and others may fail due to lack of support for capturing groups in the split operation.

License

See LICENSE.

Deviations from the published Myers diff algorithm

jsdiff deviates from the published algorithm in a couple of ways that don't affect results but do affect performance:

jsdiff keeps track of the diff for each diagonal using a linked list of change objects for each diagonal, rather than the historical array of furthest-reaching D-paths on each diagonal contemplated on page 8 of Myers's paper.
jsdiff skips considering diagonals where the furthest-reaching D-path would go off the edge of the edit graph. This dramatically reduces the time cost (from quadratic to linear) in cases where the new text just appends or truncates content at the end of the old text.

jsdiff's People

Contributors

Stargazers

Watchers

Forkers

nv kami frankstratton lalitkapoor zzen cartercole timoxley jarthorn handsomestrife myndzi mihkel10 j-rojas yotanote ys2b7314 papandreou hqmis airportyh zmilan gerhobbelt benogle davebalmer gwicke fitzchak sethmcl akhil 6174 lefam boljen matanox vmariano tkafka jdolitsky hustxiaoc yetone ovcharik shamoons eiriklv mrbadge jasonku pedrocarrico jamesgould123 outboundexplorer riteshsanap olivia gregpabian rgeissert sevisilex bittrance lslzl3000 sumado modulexcite snowshine09 liluxdev meai nodeos codeorbio mcanthony lxkaka no-problemo jiripech bg451 linearregression sirbrillig qianghou winniebear aroliant olemis augbog loretoparisi earthgrazer vmazare jalaluddin jyyan abnbgist eleanormao chaaz rlugojr vmptk harryjudy2240 csarven m-2k boneskull codiacshq zhhb caegen wiemmore kdz samai-software shangyou piranna webflow yutin1987 burometa mcthulhu ramya-rao-a joeslee mramiro wvanderdeijl imjerrybao hmnd

jsdiff's Issues

git diffs with context

Could you please add support for git diffs with context?

git diff -p source_branch changed_branch index.html > sample.patch
will procuce following diff

diff --git a/index.html b/index.html
index d6485cc..71d2d71 100755
--- a/index.html
+++ b/index.html
@@ -3,14 +3,32 @@
   <head>
    <meta charset="utf-8">
    <meta name="google" content="notranslate" />
-   <title>Seesu</title>
+   <title>Seesu online</title>
+   <meta http-equiv="X-UA-Compatible" content="IE=edge"/>
    <link rel="shortcut icon" id="dynamic-favicon"  href="icons/icon16.png"/>
    <link rel="stylesheet" type="text/css" media="screen" href="dist/combined.css">
    <meta name="keywords"  content="seesu, seesu for iphone, seesu for android, seesu for mobile, seesu online, last.fm, vk.com music, vkontakte.ru and last.fm" >
    <meta name="description" content="Seesu is amazing web application for searching and listening music. Hot last.fm acceleration"  >
    <meta name="apple-mobile-web-app-capable" content="yes">
    <meta name="viewport"  content="minimum-scale=1.0, width=device-width, maximum-scale=0.6667, user-scalable=no" />
+   
+
+   <link rel="image_src" href="http://seesu.me/i/page-poster.png"/>
+   <meta property="og:title" content="Seesu Music (online version)" />
+   <meta property="og:description" content="Seesu is a small mashup application which combines last.fm catalog, vk.com mp3 and soundcloud mp3 libraries to lets you listen to music and meet new people." />
+   <meta property="og:image" content="http://seesu.me/i/page-poster.png" />
    <script data-main="loader" src="js-sep/require-2.1.19.min.js"></script>
+   <script type="text/javascript">
+   /*<![CDATA[*/
+   var openstat = { counter: 2101409, next: openstat };
+   (function(){
+       var opst = document.createElement('script');
+           opst.setAttribute('async', 'true');
+           opst.src = '//openstat.net/cnt.js';
+       var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(opst, s);
+   })();
+   /* ]]>*/
+   </script>

 <!--

Currently jsdiff cant parse header and context lines

JsDiff newline trouble

Hello,
thank you for good library :). It's very useful. I'm using it in my project.

I have trouble with applying patch of JsDiff on windows and linux.

server works on linux:

Index: cloudcmd.js
===================================================================
--- cloudcmd.js
+++ cloudcmd.js
@@ -1,5 +1,5 @@
+(function(){a
-(function(){
     'use strict';

     var DIR         = __dirname     + '/',
         main        = require(DIR   + 'lib/server/main'),

server works on windows:

Index: cloudcmd.js
===================================================================
--- cloudcmd.js
+++ cloudcmd.js
@@ -1,5 +1,5 @@
+(function(){a
-(function(){
\ No newline at end of file

     'use strict';

     var DIR         = __dirname     + '/',

As you see on windows OS there is an "\ No newline at end of file" string. I think that it's a reason why patch do not applying on windows.
For edeting text files I'm using Ace.

Can you help me please to figure out what's is going on.

Better Algorithm

Sorry if I'm wrong (I'm a bit new in these things), but isn't it "An O(NP) Sequence Comparison Algorithm" a better algorithm than "An O(ND) Difference Algorithm and its Variations"?

(Sorry for my english)

applyPatches don't use offsets

(Related to #82)

While processing https://raw.githubusercontent.com/GregorR/musl-cross/master/patches/gcc-4.7.3-musl.diff with my pull-request I get some index that couldn't be applied, getting patched() callback content equals to false. After processing the same file with patch --verbose -Np1, it show that this entries are correctly applied but with an offset:

Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|# HG changeset patch
|# Parent f50bb54f331f73405131a30b4f353cfda1c70304
|Use the generic implementation of libstdc++ primitives when we're on musl, not the glibc one.
|
|diff -r f50bb54f331f libstdc++-v3/configure.host
|--- a/libstdc++-v3/configure.host  Fri Mar 29 16:38:52 2013 -0400
|+++ b/libstdc++-v3/configure.host  Fri Mar 29 16:41:10 2013 -0400
--------------------------
patching file libstdc++-v3/configure.host
Using Plan A...
Hunk #1 succeeded at 243 (offset -21 lines).
Hunk #2 succeeded at 258 (offset -21 lines).

Seems applyPatches() should try to find the place where the context fit and apply that offset to the patched content, isn't it?

P.D.: don't know if it's related, but at the end of processing the patch file, I get one call to loadFile() with undefined as index, but ignoring it don't make problems, so maybe it's an extra call that should not be there...

The demo is broken with diff lines

After writing a simple one-word-each 2 lines to both editors and removing them, no matter what I do to make the 2 editors in sync - seems like the diff is saving some bad cache state, where it shouldn't.

The diff should be calculated every time any of the editors change.

Word diff "ignores" newline and extra spaces

Not sure if this is a feature or part of design, but whereas a line diff will treat a new line character (\n at least) as a potential difference, word diff will not. Indeed it's not a word, and I can see the inherent abstract consistency of this treatment. But it may come to mind providing control over what characters delimit words and what characters don't, for word diff. Anyway please don't break the API, the default should behave as it does now..

Having said that, the problem is more severe than new line handling, e.g. any non-word delimiter such as a space character, is just swallowed during the word diff process, and does not show as a difference in the word diff result. I think this aspect should be reconsidered for improvement (while keeping default behavior as is).

Long diff

If there are so many diffs between two texts, this loop works so slow that the operating system thinks that it fails, so after a while stops it at all.

It is a variant to show not all diffs, but only a part of them.
For example, with the help of some option we could set the maximum amount of diffs.

error throwing when apply patch to empty string

Here is the demo.
http://plnkr.co/edit/VSwrPtUuGSijx2JQBZNu?p=info

Missing count

diff.diffLines('var a = 0;', 'var a = 0;\nvar b = 1;\nvar c = 3') returns

[{
  added: undefined,
  count: 1,
  removed: true,
  value: "var a = 0;",
},
{
  added: true,
  count: 3,
  removed: undefined,
  value: "var a = 0;↵var b = 1;↵var c = 3",
}]

diff.diffLines('', 'var a = 0;\nvar b = 1;') returns

[{
  added: true,
  value: "var a = 0;↵var b = 1;",
}]

what happened to count?

read and apply patches

I see that patches can be written out. Being able to read the patches and apply them (should be separate in case patches get serialized to JSON instead) would be sweet.

html support

Is there anyway for this to work with html, by maintaining the original markup or ignoring html tags, so that only the text is updated, but the html remains intact?

Suggestion: expose fbDiff

Hi, thanks for the library. One suggestion is to expose fbDiff in module.exports. E.g.,

return {
  fbDiff: fbDiff,
  removeEmpty: removeEmpty,
  diffChars: function(oldStr, newStr) { return CharDiff.diff(oldStr, newStr); },
  ...

That way, we can define our own tokenizers. E.g., (in coffeescript)

JSdiff = require 'diff'
PatDiff = new JSdiff.fbDiff(true);
PatDiff.tokenize = (value) -> JSdiff.removeEmpty(value.split(/(\s+)/));
...
PatDiff.diff @body, @body.replace(pattern, replacement)

Add karma and sauce tests

We should have in browser testing to ensure compatibility. We should also make a compatibility statement in the docs.

Fuzz factor for applyPatch

Oh my. If I'm reading this code correctly, it looks like applyPatch currently ignores context lines altogether. Oh, no. Oh, deary dear.

From the original patch man page:

-F  max-fuzz, --fuzz max-fuzz
     Sets the maximum fuzz factor.  This option only applies to con-
     text diffs, and causes patch to ignore up to that many lines in
     looking for places to install a hunk.  Note that a larger fuzz
     factor increases the odds of a faulty patch.  The default fuzz
     factor is 2, and it may not be set to more than the number of
     lines of context in the context diff, ordinarily 3.

The applyPatch function should have some smarts over determining where to apply the patch (or if the patch should even apply at all). I'd actually kind of like a fuzzFactor that can be specified in units other than lines, too (so, for example, it's OK to match when 80% of the line is the same, or when there are only two word differences, or no instances where lines are added/removed or a word is replaced with multiple words or vice versa, or something like that).

Provide example documentation

Need to provide some examples of the API usage in both CommonJS and browser land.

Cannot read property 'oldlines' of undefined

I'm getting this error when using jsdiff via grunt-patch. I've posted the issue with details on the grunt-patch repo (nettantra/grunt-patch#3), but the error appears to be coming from jsdiff, so I thought I'd post it here as well. Any ideas what may be causing this issue?

diffWords treats \n at the end as significant whitespace

we have the following behavior:

coffee> diff = require 'diff'

# works as expected
coffee> diff.diffWords("hase igel fuchs", "hase igel fuchs")
[ { value: 'hase igel fuchs' } ]

# newline at righthand results in a change
coffee> diff.diffWords("hase igel fuchs", "hase igel fuchs\n")
[ { count: 5, value: 'hase igel fuchs' },
  { count: 1, added: true, removed: undefined, value: '\n' } ]

# newline at lefthand results in a change
coffee> diff.diffWords("hase igel fuchs\n", "hase igel fuchs")
[ { count: 5, value: 'hase igel fuchs' },
  { count: 1, added: undefined, removed: true, value: '\n' } ]

# newline in the middle words also as expected
coffee> diff.diffWords("hase igel fuchs", "hase igel\nfuchs")
[ { value: 'hase igel\nfuchs' } ]

when we read the documentation right, every whitespace should be ignored when using diffWords.

Any thoughts on that?

/cc @Partyschaum

Feature Reuqest: 3-way merge

Would you like to add a 3-way merge feature?

diffLines doesn't support ignoring whitespace or leading/following spaces?

I feel like I'm missing something, because it seems like such an obvious feature to include, but there doesn't appear to be a version of diffLines that ignores whitespace and/or leading/following spaces.

Custom regex

It would be super nice to allow a custom regex for splitting the words, for example. I need that in one of my projects.

This could be useful when I want to consider words containing special characters (like . or so).

cannot require from npm module in node

The package.json file in the published npm version 2.0.1 contains the line "main": "./lib".
As the folder ./lib is not included in the npm package, the module cannot be required in node after using npm i diff.

Please change the package.json file for the npm package to "main": "./dist/diff.js" or do other smart things.

Near Endless Loop calling execEditLength

I was noticing an endless hang when using mocha and filed this issue and managed to run down the problem in mocha to the use of jsdiff. Their use of createPatch here seems to cause an almost endless loop.

While it would eventually fail once maxEditLength is exceeded this would take ages as diagonalPath grows at the rate of editLength here.

The test has a maxEditLength of 57579 and even after about ~10 minutes of processing editLength is only 4542. Sadly it gets exponential worse.

Update of gh-pages and examples

Hi,

I was struggling with a code example in a jsfiddle, and suddently I realized the js file I pulled was 4 years old.

I pulled http://kpdecker.github.io/jsdiff/diff.js, which comes from gh-pages: https://github.com/kpdecker/jsdiff/tree/gh-pages

Do you plan updating this?

Automatically call removeEmpty for all tokenizer calls

Right now implementors need to manually perform something like removeEmpty. This should be done automatically (but overridable) by the logic calling tokenize.

Issue raised in #63

Filenames not added to indexes

When using applyPatches() with https://raw.githubusercontent.com/GregorR/musl-cross/master/patches/gcc-4.7.3-musl.diff I get the next result on loadFile callback:

{ hunks: 
   [ { oldStart: 264,
       oldLines: 6,
       newStart: 264,
       newLines: 13,
       lines: [Object] },
     { oldStart: 272,
       oldLines: 6,
       newStart: 279,
       newLines: 9,
       lines: [Object] },
     { oldStart: 522,
       oldLines: 7,
       newStart: 522,
       newLines: 7,
       lines: [Object] },
     { oldStart: 625,
       oldLines: 6,
       newStart: 625,
       newLines: 9,
       lines: [Object] },
     { oldStart: 33,
       oldLines: 10,
       newStart: 33,
       newLines: 12,
       lines: [Object] },
     { oldStart: 54,
       oldLines: 18,
       newStart: 56,
       newLines: 21,
       lines: [Object] },
     { oldStart: 85,
       oldLines: 21,
       newStart: 90,
       newLines: 21,
       lines: [Object] },
     { oldStart: 108,
       oldLines: 3,
       newStart: 113,
       newLines: 74,
       lines: [Object] },
     { oldStart: 30,
       oldLines: 3,
       newStart: 30,
       newLines: 7,
       lines: [Object] },
     { oldStart: 184,
       oldLines: 6,
       newStart: 184,
       newLines: 7,
       lines: [Object] },
     { oldStart: 200,
       oldLines: 6,
       newStart: 201,
       newLines: 7,
       lines: [Object] },
     { oldStart: 215,
       oldLines: 6,
       newStart: 217,
       newLines: 7,
       lines: [Object] },
     { oldStart: 28,
       oldLines: 6,
       newStart: 28,
       newLines: 8,
       lines: [Object] },
     { oldStart: 47,
       oldLines: 28,
       newStart: 47,
       newLines: 13,
       lines: [Object] },
     { oldStart: 26736,
       oldLines: 6,
       newStart: 26736,
       newLines: 9,
       lines: [Object] },
     { oldStart: 26769,
       oldLines: 6,
       newStart: 26772,
       newLines: 7,
       lines: [Object] },
     { oldStart: 26851,
       oldLines: 6,
       newStart: 26855,
       newLines: 9,
       lines: [Object] },
     { oldStart: 4719,
       oldLines: 6,
       newStart: 4719,
       newLines: 9,
       lines: [Object] },
     { oldStart: 4752,
       oldLines: 6,
       newStart: 4755,
       newLines: 7,
       lines: [Object] },
     { oldStart: 4817,
       oldLines: 6,
       newStart: 4821,
       newLines: 9,
       lines: [Object] },
     { oldStart: 19,
       oldLines: 7,
       newStart: 19,
       newLines: 8,
       lines: [Object] },
     { oldStart: 4,
       oldLines: 7,
       newStart: 4,
       newLines: 7,
       lines: [Object] },
     { oldStart: 125,
       oldLines: 6,
       newStart: 125,
       newLines: 7,
       lines: [Object] },
     { oldStart: 251,
       oldLines: 17,
       newStart: 252,
       newLines: 13,
       lines: [Object] },
     { oldStart: 295,
       oldLines: 7,
       newStart: 292,
       newLines: 7,
       lines: [Object] },
     { oldStart: 304,
       oldLines: 7,
       newStart: 301,
       newLines: 7,
       lines: [Object] },
     { oldStart: 361,
       oldLines: 7,
       newStart: 358,
       newLines: 6,
       lines: [Object] },
     { oldStart: 370,
       oldLines: 10,
       newStart: 366,
       newLines: 8,
       lines: [Object] },
     { oldStart: 407,
       oldLines: 7,
       newStart: 401,
       newLines: 7,
       lines: [Object] },
     { oldStart: 415,
       oldLines: 11,
       newStart: 409,
       newLines: 10,
       lines: [Object] },
     { oldStart: 820,
       oldLines: 10,
       newStart: 813,
       newLines: 6,
       lines: [Object] },
     { oldStart: 1132,
       oldLines: 8,
       newStart: 1121,
       newLines: 13,
       lines: [Object] },
     { oldStart: 1346,
       oldLines: 6,
       newStart: 1340,
       newLines: 7,
       lines: [Object] },
     { oldStart: 21,
       oldLines: 3,
       newStart: 21,
       newLines: 4,
       lines: [Object] },
     { oldStart: 30,
       oldLines: 3,
       newStart: 30,
       newLines: 7,
       lines: [Object] },
     { oldStart: 25,
       oldLines: 16,
       newStart: 25,
       newLines: 19,
       lines: [Object] },
     { oldStart: 101,
       oldLines: 5,
       newStart: 104,
       newLines: 6,
       lines: [Object] },
     { oldStart: 64,
       oldLines: 6,
       newStart: 64,
       newLines: 23,
       lines: [Object] },
     { oldStart: 40,
       oldLines: 7,
       newStart: 40,
       newLines: 11,
       lines: [Object] },
     { oldStart: 18,
       oldLines: 3,
       newStart: 18,
       newLines: 10,
       lines: [Object] },
     { oldStart: 2112,
       oldLines: 6,
       newStart: 2112,
       newLines: 10,
       lines: [Object] },
     { oldStart: 364,
       oldLines: 17,
       newStart: 364,
       newLines: 21,
       lines: [Object] },
     { oldStart: 18,
       oldLines: 3,
       newStart: 18,
       newLines: 4,
       lines: [Object] },
     { oldStart: 551,
       oldLines: 6,
       newStart: 551,
       newLines: 9,
       lines: [Object] },
     { oldStart: 611,
       oldLines: 7,
       newStart: 614,
       newLines: 8,
       lines: [Object] },
     { oldStart: 789,
       oldLines: 15,
       newStart: 793,
       newLines: 18,
       lines: [Object] },
     { oldStart: 923,
       oldLines: 6,
       newStart: 930,
       newLines: 7,
       lines: [Object] } ] }

As you can see, the file names on the patch are not added anywhere, and in fact the whole patch is considered to be for only one (anonimous) index. I think the problem is related to be using Index: as identifier of each one of the files, while each file is being separated with a diff line:

# HG changeset patch
# Parent f50bb54f331f73405131a30b4f353cfda1c70304
Use the generic implementation of libstdc++ primitives when we're on musl, not the glibc one.

diff -r f50bb54f331f libstdc++-v3/configure.host
--- a/libstdc++-v3/configure.host   Fri Mar 29 16:38:52 2013 -0400
+++ b/libstdc++-v3/configure.host   Fri Mar 29 16:41:10 2013 -0400
@@ -264,6 +264,13 @@
     os_include_dir="os/bsd/freebsd"
     ;;
   gnu* | linux* | kfreebsd*-gnu | knetbsd*-gnu)
+    # check for musl by target
+    case "${host_os}" in
+      *-musl*)
+        os_include_dir="os/generic"
+        ;;
+      *)
+
     if [ "$uclibc" = "yes" ]; then
       os_include_dir="os/uclibc"
     elif [ "$bionic" = "yes" ]; then
@@ -272,6 +279,9 @@
       os_include_dir="os/gnu-linux"
     fi
     ;;
+
+    esac
+    ;;
   hpux*)
     os_include_dir="os/hpux"
     ;;

I believe it would be as simple as allowing to use not only Index: as identifier but also diff too.

Also, I think that instead of ignoring the headers when no index is found, it would be good to add them although they are partial, or if it's done this way to consume the patch lines, maybe the var i would be increased directly without calling parseFileHeader() with an empty object...

Special characters

Hi!

First of all, congrats for this awesome solution for string diffs.

I'm trying to get the diff between two words that contains special characters, very common in Portuguese and Spanish languages.

For example, the diff between jurídica and física is returning f jur í sica dica, but it should recognize them as completely different words.

If I take off the special character "í" the result is two different words: fisica and juridica.

Is this behavior expected?

Cheers!

% Match between 2 strings

would be really useful to have a function which calculates the % match between the two given strings. It would tell user how similar the strings are in the form of a %.

Custom tokenize for diffWords

I need to extend the regular expression for wordDiff.tokenize = wordWithSpaceDiff.tokenize = ...

Diffing using {{placeholders}}

I'd like to be able to diff using {{placeholders}}. Below is an example of what I'd like to achieve.

Template:

The direct debit of customer {{name}} at {{company}} for £{{amount}} on {{date}} has failed.

Input:

The direct debit of customer John Doe at ABC Limited for £60 on 07/10/2015 has failed.

Output:

{
  name: 'John Doe',
  company: 'ABC Limited',
  amount: '60',
  date: '07/10/2015'
}

Using the standard word-based diffing doesn't exactly work, especially in the case of amount above (the template key is preceded by '£'), and also if the template key contains any spaces. Therefore I think I need a custom implementation which tokenizes on {{ and }}, but I'm a bit stuck.

Any pointers on how to implement this would be very much appreciated.

Wrong result of JsDiff.diffLines when there is an empty line before a change

var oldStr = "Line1\n" + "\n" + "Line2\n";
var newStr = "Line1\n" + "\n" + "ChangedLine\n";
JsDiff.diffLines(oldStr, newStr);

outputs

[
  {
    "count": 1, // Should be 2
    "value": "Line1\r\n" // Should be "Line1\n\n"
  },
  {
    "count": 1,
    "added": true,
    "value": "ChangedLine\n"
  },
  {
    "count": 1,
    "removed": true,
    "value": "Line2\n"
  }
]

In fact it seems successive \n are replaced by \r and ignored in counts

A new state "replace"

This is a feature request rather than a issue. A new state "replace" is require.
See https://github.com/reviewboard/reviewboard/blob/master/reviewboard/diffviewer/myersdiff.py#L123

IE8 issue with split

Hi-
In my recent project I noticed that, while the code executed in IE8, the resulting string from JsDiff.applyPatch was scrambled. I was able to work around the issue by using Steven Levithan's polyfill for split ( http://blog.stevenlevithan.com/archives/cross-browser-split ). The way the variable 'meh' in applyPatch is being used isn't supported in IE8. I wouldn't use the polyfill in all situations as it seems to be slower in some cases.

diffLines seems broken

Code fails in ie9

first off great lib. i love it.

i did notice an error when using it in ie. in the diffview you are treating tdata and rows as arrays not realizing they associative arrays. and when looping through them you are picking up properties like foreach and length. not sure if you meant to do that but ie chokes on it and doesn't behave nice. for my implementation i changed the code to:

    for (var idx=0; idx < rows.length; idx++)
        {
         node.appendChild(rows[idx]);
        }

instead of:

           for(var idx in rows) node.appendChild(rows[idx]);

far less elegant but seams to be more acceptable to ie.

also had to change the tdata loop as well

    for (var idx=0; idx < tdata.length; idx++)
    {
        node.appendChild(tdata[idx]);
    }

again thanks for the great lib!

Consider using options object API for flag permutations

We have a lot of diff implementations that are behind sometimes cumbersome named methods. We should look into using an option object to merge these down and deprecate the named methods.

This should be possible without doing a breaking release.

Flip added and removed order?

I'm wondering if it would make sense to flip the order in which added vs removed change objects get returned diffLines. This would make it easier to make a "git-like" diff without having to swap the added and removed semantics.

This is what I'm doing right now for Vaccine:

var chunks = jsdiff.diffLines(next, old).map(function(d) {
  if (d.removed) {
    return '<span class="added">' + prepend(d.value, '+') + '</span>';
  } else if (d.added) {
    return '<span class="removed">' + prepend(d.value, '-') + '</span>';
  } else {
    return prepend(d.value, ' ');
  }
});
return chunks.join('\n');

Support multiple diff hunks

The method applyPatch() seems only support the first diff hunks, that don't allow to use it when a patch file has several chunks or chunks for several files. First case would be easy to do, just only continue for the next one. For the several files case it would be harder, but how could it be done, specially when you don't know the exact files that need to be patched? Maybe a hash object with the string content of the files, being the paths the key?

idea for usage

@kpdecker, I'm thinking about creating an open source project/website that offers a simple free service: diffs for gists (gistdiff.com). In a nutshell, visitors would be able to add two gist urls/IDs (maybe more) to see a diff for those gists. Then maybe get an embed code to embed the diff in a blog post or tutorial.

My motivation is that I need this! there are so many times when I've written documentation or replied to an issue and wanted to show the difference between two code implementations and why one is advantageous over the other.... plus we could use something like this on the new lesscss.org website that I'm working on.

Any interest in collaborating on this?

Html support. Help! :)

Need "diffWordsHtml". For example:
old: test javascript
new: test production javascript

Let it work as diffWords but with html tags

invalid-meta jsdiff is missing "main" entry in bower.json

Please add bower.json to latest tag (better in new).

Create component repository for bower

Since we are now building the final release artifact, we need to push each build to a bower component repo.

createPatch -> applyPatch failed.

var diff = require('./diff');

var oldtext = 'AAAAAAAAAAAAAAAA\n\n';
var newtext = 'AAAAAAAAAAAAAAAA\nBBBBBB' + String.fromCharCode(8232) + '\nCCCCCCCCCCCCCCCCCC\n\n';


var diffed = diff.createPatch('test', oldtext, newtext);
console.log(diff.applyPatch(oldtext, diffed) !== newtext ? 'failed' : 'success');

This code result must be success.
But It's failed.

Should be able to diff against a 0 length string

This would be useful in making this general purpose to diff against things that didn't exist. I will make a PR now

Import tests

Need to import the tests that are part of the Firediff test suite

JsDiff.diffChars hangs forever

I was running JsDiff.diffChars on wikipedia revisions and I noticed that once in a while diffChars seems to hang forever. Here is one example where this is the case:
https://gist.github.com/SuperAmin/b3f9f4ef2a22dc1fecb0

Slow to execute over diffs with a large number of changes

In some cases JsDiff.diffWords ate more than 500Mb RAM and I had to kill the process where it was launched

I found this in node.js environment but it also produced in browser.

Data for reproducing this bug is here: https://gist.github.com/termi/9875658

Is applyPatches() exposed in the API?

I'm trying to use JsDiff.applyPatches() on v2.1.0 of your npm package, but i get the error

TypeError: Object #<Object> has no method 'applyPatches'
    // ...

A quick look at index.js indicates that only applyPatch is exposed, as I don't see applyPatches there:

import {applyPatch} from './patch/apply';

Feature request: intelligent diff

Hi,

Thanks for the great library! I have one problem with it, though. With larger changes, this library just falls to pieces. For example, running this:

var start = 'This is some test copy which is designed to demo stuff as text is deleted and modifyd. Make a suggestion here.',
    end = 'Hello, please do not copy this text';

JsDiff.diffWords(start, end);

Produces this:

In the majority of cases, I want either word diffs or character diffs, but as larger changes are made (which isn't as common), I find it unpleasant. A character replacement here is even worse.

Would it possible to add an "intelligent" diff mode which will detect the density of diffs in a word or sentence and run, say, a sentence diff on that sentence instead?

Thanks a lot!

word tokenizer works only for 7 bit ascii

the word tokenizer for WordDiff and WordWithSpaceDiff uses \b in its regular expression. that considers word characters as [a-zA-Z0-9_], which fails on anything beyond 7 bit.

f.e. the german phrase "wir üben" splits to:

'wir üben'.split(/\b/);
-> ["wir", " ü", "ben"]

replacing the tokenizer with value.split(/(\s+)/) is sufficient in my use-case, but i don't have newlines in my text. some further testing needed, i think.

further reading:
http://stackoverflow.com/questions/10590098/javascript-regexp-word-boundaries-unicode-characters/10590620#10590620

diff headers give error

When patching with https://github.com/maximeh/buildroot/blob/0400322f2d4e28faa98e31815921f54106aeb2e6/package/openssl/openssl-004-musl-termios.patch, it give the next error:

/home/piranna/Proyectos/NodeOS/node_modules/diff/lib/patch/parse.js:44
        throw new Error('Unknown line ' + (i + 1) + ' ' + JSON.stringify(diffs
              ^
Error: Unknown line 1 "http://rt.openssl.org/Ticket/Display.html?id=3123"
    at parseIndex (/home/piranna/Proyectos/NodeOS/node_modules/diff/lib/patch/parse.js:44:15)
    at Object.parsePatch (/home/piranna/Proyectos/NodeOS/node_modules/diff/lib/patch/parse.js:120:5)
    at applyPatch (/home/piranna/Proyectos/NodeOS/node_modules/diff/lib/patch/apply.js:13:22)
    at /home/piranna/Proyectos/NodeOS/node_modules/nodeos-barebones/scripts/preinstall:56:30
    at fs.js:334:14
    at FSReqWrap.oncomplete (fs.js:95:15)

Seems that after adding support for multiple indexes this got broken and now the first comment lines of the patch don't get ignored.