Coder Social home page Coder Social logo

docdiff's Introduction

DocDiff

(C) 2000 Hisashi MORITA

Todo

  • Incorporate ignore space patch.
  • Better auto-recognition of encodings and eols.
  • Make CSS and tty escape sequence customizable in config files.
  • Better multilingualization using Ruby 1.9 feature.
  • Write "DocPatch".

Description

Compares two text files by word, by character, or by line

Summary

DocDiff compares two text files and shows the difference. It can compare files word by word, character by character, or line by line. It has several output formats such as HTML, tty, Manued, or user-defined markup.

It supports several encodings and end-of-line characters, including ASCII (and other single byte encodings such as ISO-8859-*), UTF-8, EUC-JP, Shift_JIS, CR, LF, and CRLF.

Requirement

  • Ruby (http://www.ruby-lang.org) (Note that you may need additional ruby library such as iconv, if your OS's Ruby package does not include those.)

Installation

Note that you need appropriate permission for proper installation (you may have to have a root/administrator privilege).

  • Place docdiff/ directory and its contents to ruby library directory, so that ruby interpreter can load them.
# cp -r docdiff /usr/lib/ruby/1.9.1
  • Place docdiff.rb in command binary directory.
# cp docdiff.rb /usr/bin/
  • (Optional) You may want to rename it to docdiff.
# mv /usr/bin/docdiff.rb /usr/bin/docdiff
  • (Optional) When invoked as chardiff or worddiff, docdiff runs with resolution set to char or word, respectively.
# ln -s /usr/bin/docdiff.rb /usr/bin/chardiff.rb
# ln -s /usr/bin/docdiff.rb /usr/bin/worddiff.rb
  • Set appropriate permission.
# chmod +x /usr/bin/docdiff.rb
  • (Optional) If you want site-wide configuration file, place docdiff.conf.example as /etc/docdiff/docdiff.conf and edit it.
# cp docdiff.conf.example /etc/docdiff.conf
# $EDITOR /etc/docdiff.conf
  • (Optional) If you want per-user configuration file, place docdiff.conf.example as ~/etc/docdiff/docdiff.conf and edit it.
% cp docdiff.conf.example ~/etc/docdiff.conf
% $EDITOR ~/etc/docdiff.conf

Usage

Synopsis

% docdiff [options] oldfile newfile

e.g.

% docdiff old.txt new.txt > diff.html

See the help message for detail (docdiff --help).

License

This software is distributed under so-called modified BSD style license (http://www.opensource.org/licenses/bsd-license.php (without advertisement clause)). By contributing to this software, you agree that your contribution may be incorporated under the same license.

Copyright and condition of use of main portion of the source:

Copyright (C) Hisashi MORITA.  All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
1. Redistributions of source code must retain the above copyright
   notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright
   notice, this list of conditions and the following disclaimer in the
   documentation and/or other materials provided with the distribution.
3. Neither the name of the University nor the names of its contributors
   may be used to endorse or promote products derived from this software
   without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.

diff library (docdiff/diff.rb and docdiff/diff/*) was originally a part of Ruby/CVS by Akira TANAKA. Ruby/CVS is licensed under modified BSD style license. See the following for detail.

Credits

  • Hisashi MORITA (primary author)

Acknowledgments

  • Akira TANAKA (diff library author)
  • Shin'ichiro HARA (initial idea and algorithm suggestion)
  • Masatoshi SEKI (patch)
  • Akira YAMADA (patch, Debian package)
  • Kenshi MUTO (testing, bug report, Debian package)
  • Kazuhiro NISHIYAMA (bug report)
  • Hiroshi OHKUBO (bug report)
  • Shugo MAEDA (bug report)
  • Kazuhiko (patch)
  • Shintaro Kakutani (patches)
  • Masayoshi Takahashi (patches)
  • Masakazu Takahashi (patch)
  • Hibariya (bug report)
  • Hiroshi SHIBATA (patch)

Excuse us this list is far from complete and fails to acknowledge many more who have helped us somehow. We really appreciate it.

Resources

Format

Similar Software

There are several other software that can compare text word by word and/or character by character.

docdiff's People

Contributors

emasaka avatar hisashim avatar hsbt avatar kakutani avatar takahashim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

docdiff's Issues

UTF8 treated as ASCII

OS: FreeBSD 10.3
docdiff Version: 0.4.0

The following command seems to ignore the --utf8 option as I get the same error with or without it.
/usr/local/bin/docdiff --utf8 --format=html file1 file2

Output:
/usr/local/lib/ruby/site_ruby/2.2/docdiff/charstring.rb:75:in 'scan': invalid byte sequence in US-ASCII (ArgumentError)
from /usr/local/lib/ruby/site_ruby/2.2/docdiff/charstring.rb:75:in 'guess_eol'
from /usr/local/bin/docdiff:332:in '<main>'

Large changes interrupted by single words

When using the word resolution (possibly character too, haven't tested it), I often find large paragraphs separated by single words, making it more difficult to discern changes. I was thinking maybe just merging the changes surrounding the single word, and just having the same word in both the deletion and the addition. Possibly determined by the size of the changes around it in relation to the isolated word.

For instance, a single word surrounded on both sides by single word replacements wouldn't get merged into the changes, but a single word surrounded by five word changes on either side would be merged into them, creating just one change rather than two divided by an unchanged word.

Release new version

Hi,

Could you release this software as a new version?

There are a lot of changes since 2011 (0.5.0).

If you do it, I can update the debian package too.

best regards,

JIS0208.TXT is considered as non-free?

Debian Bug#801497 claims devutil/JIS0208.TXT appears non-free.

I red copyright of this file and wondered the sentence;

Unicode, Inc. pecifically excludes the right to re-distribute this file directly to third parties or other organizations whether for profit or not.

unify differences

I don't understand the difference between del/before-change and add/after-change. I rather use only two styles instead of four: del and add, red and green. Seems more logical to me.

Proposal: Either unify del/before-change and add/after-change or provide a command line option to enable this feature.

better paragraph support

The HTML output uses
instead of

. That's logical, because the diff is line-by-line. But it's hard for copy+pasting that into Wordpress, since Wordpress adds

and ends s at the end of a

.

Is there any chance to insert an option for using

instead of two
?

While on this, closing all tags at the end of each paragraph sounds logical, to avoid mistakes in CMSs.

docdiff はいくつかの文字を勝手に消してしまう

https://twitter.com/mametter/statuses/54587287954141184
単語単位で色つき diff を得るちゃんとしたツールが意外に見つからなかった。colordiff+wdiff は複数行に渡る変更を色付けしてくれない、docdiff はいくつかの文字を勝手に消してしまう、vimdiff は折り返しをしてくれない。結局 kdiff3 使った

今さらながら大変申し訳ない。どんな文字が消えてしまったかお聞きして直すこと。

Update with Ruby 2

Todo:

  • Remove irrelevant files, such as Makefile, etc.
  • Drop Ruby 1.8 support (doh!)
  • Preliminary fix for encoding problems in lib/docdiff/document.rb
  • Switch to Ruby 2's m17n features eventually.

TypeError occurred when using RSpec

When I use Docdiff and RSpec in the same context, TypeError occurred.

repro codes:

require 'rspec'
require 'docdiff/difference' # TypeError: Diff is not a class

I think, The reason of exception is 'the Diff defined as a module at diff-lcs'.
(rspec is dependent to diff-lcs.)

I don't know how to fix this problem.

improve word detection

resolution=word seems to ignore punctiation (and Umlauts), which is not intuitive. Example:

<span class="common">anderen Lä</span><span class="before-change"><del>ndern.</del></span><span class="after-change"><ins>ndern (vgl. foo). </ins></span>

In this case, logically this would be more accurate:

<span class="common">anderen Ländern</span><span class="before-change"><del>.</del></span><span class="after-change"><ins> (vgl. foo).</ins></span>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.