Coder Social home page Coder Social logo

Comments (2)

jeffkaufman avatar jeffkaufman commented on July 18, 2024

Duplicate of #57; both of these are difflib issues.

from icdiff.

lurch avatar lurch commented on July 18, 2024

I've been having a poke around this afternoon - as #57 points out, icdiff calls difflib._mdiff, which in turn calls difflib.ndiff, so that's what I'm using here. Python2 and Python3 both show the following behaviour (which I can't explain, as I haven't delved too deeply into the internals of difflib's algorithm!)

>>> import difflib
>>> list(difflib.ndiff(['aaa'], ['aaa']))
['  aaa']
>>> list(difflib.ndiff(['aaa'], ['baa']))
['- aaa', '+ baa']
>>> list(difflib.ndiff(['aaa'], ['aba']))
['- aaa', '+ aba']
>>> list(difflib.ndiff(['aaa'], ['aab']))
['- aaa', '+ aab']
>>> list(difflib.ndiff(['aaaa'], ['aaaa']))
['  aaaa']
>>> list(difflib.ndiff(['aaaa'], ['baaa']))
['- aaaa', '?    -\n', '+ baaa', '? +\n']
>>> list(difflib.ndiff(['aaaa'], ['abaa']))
['- aaaa', '+ abaa']
>>> list(difflib.ndiff(['aaaa'], ['aaba']))
['- aaaa', '?    -\n', '+ aaba', '?   +\n']
>>> list(difflib.ndiff(['aaaa'], ['aaab']))
['- aaaa', '?    ^\n', '+ aaab', '?    ^\n']
>>> list(difflib.ndiff(['aaaaa'], ['aaaaa']))
['  aaaaa']
>>> list(difflib.ndiff(['aaaaa'], ['baaaa']))
['- aaaaa', '?     -\n', '+ baaaa', '? +\n']
>>> list(difflib.ndiff(['aaaaa'], ['abaaa']))
['- aaaaa', '+ abaaa']
>>> list(difflib.ndiff(['aaaaa'], ['aabaa']))
['- aaaaa', '?     -\n', '+ aabaa', '?   +\n']
>>> list(difflib.ndiff(['aaaaa'], ['aaaba']))
['- aaaaa', '?     -\n', '+ aaaba', '?    +\n']
>>> list(difflib.ndiff(['aaaaa'], ['aaaab']))
['- aaaaa', '?     ^\n', '+ aaaab', '?     ^\n']
>>> list(difflib.ndiff(['aaaaaa'], ['abaaaa']))
['- aaaaaa', '+ abaaaa']
>>> list(difflib.ndiff(['aaaaaaa'], ['abaaaaaaa']))
['- aaaaaaa', '+ abaaaaaaa', '? ++\n']

So it seems difflib is able to sometimes correctly determine the minimum diff (the output lines above containing ?), but it's somehow sensitive to the length of the matching and non-matching parts of the string and also the position of the non-match?

https://docs.python.org/2/library/difflib.html#difflib.SequenceMatcher sounds like it might be kinda complicated, so perhaps this is already the best that's reasonably possible? shrug

from icdiff.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.