Coder Social home page Coder Social logo

damerau-levenshtein's People

Contributors

azhi avatar dimus avatar ixti avatar jozr avatar lazylester avatar nakilon avatar scarroll32 avatar ybiquitous avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

damerau-levenshtein's Issues

Weighted Levenshtein

Most existing Levenshtein libraries are not very flexible: all edit operations have cost 1.

However, sometimes not all edits are created equal. For instance, if you are doing OCR correction, maybe substituting '0' for 'O' should have a smaller cost than substituting 'X' for 'O'. If you are doing human typo correction, maybe substituting 'X' for 'Z' should have a smaller cost, since they are located next to each other on a QWERTY keyboard.

There is an implementation in Python here. It would be great to support a way to add weights.

Is it something you would consider?

damerau-levenshtein-1.3.2/lib/damerau-levenshtein.rb:17: [BUG] Segmentation fault at 0x00007ffee4716000

Crash when I try to get the Edit Distance.

$ ruby test.rb
/Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/damerau-levenshtein-1.3.2/lib/damerau-levenshtein.rb:17: [BUG] Segmentation fault at 0x00007ffee4716000
ruby 2.6.0p0 (2018-12-25 revision 66547) [x86_64-darwin18]

-- Crash Report log information --------------------------------------------
   See Crash Report log file under the one of following:                    
     * ~/Library/Logs/DiagnosticReports                                     
     * /Library/Logs/DiagnosticReports                                      
   for more details.                                                        
Don't forget to include the above Crash Report log file in bug reports.     

-- Control frame information -----------------------------------------------
c:0004 p:---- s:0026 e:000025 CFUNC  :internal_distance
c:0003 p:0029 s:0018 e:000017 METHOD /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/damerau-levenshtein-1.3.2/lib/damerau-levenshtein.rb:17
c:0002 p:0033 s:0010 E:000810 EVAL   test.rb:5 [FINISH]
c:0001 p:0000 s:0003 E:000800 (none) [FINISH]

-- Ruby level backtrace information ----------------------------------------
test.rb:5:in `<main>'
/Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/damerau-levenshtein-1.3.2/lib/damerau-levenshtein.rb:17:in `distance'
/Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/damerau-levenshtein-1.3.2/lib/damerau-levenshtein.rb:17:in `internal_distance'

-- Machine register context ------------------------------------------------
 rax: 0x000000000000022d rbx: 0x0000000000000049 rcx: 0x00007ffee4713a08
 rdx: 0x0000000000000002 rdi: 0x00000000ffffffff rsi: 0x000000000000022e
 rbp: 0x00007ffee4714fa0 rsp: 0x00007ffee4713a00  r8: 0x0000000000000049
  r9: 0x000000000000022d r10: 0x0000000000000000 r11: 0x000000000000022f
 r12: 0x00007ffee4713a00 r13: 0x00007ffee4714e98 r14: 0x0000000000000001
 r15: 0x00007ffee4714e90 rip: 0x000000010db5cd14 rfl: 0x0000000000010246

-- C level backtrace information -------------------------------------------
/Users/puritysb/.rbenv/versions/2.6.0/bin/ruby(rb_vm_bugreport+0x82) [0x10b725532]
/Users/puritysb/.rbenv/versions/2.6.0/bin/ruby(rb_bug_context+0x1d6) [0x10b575d56]
/Users/puritysb/.rbenv/versions/2.6.0/bin/ruby(sigsegv+0x51) [0x10b68aa81]
/usr/lib/system/libsystem_platform.dylib(_sigtramp+0x1d) [0x7fff66b0eb1d]
/Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/damerau-levenshtein-1.3.2/lib/damerau-levenshtein/damerau_levenshtein.bundle(method_internal_distance+0x4b4) [0x10db5cd14]
/Users/puritysb/.rbenv/versions/2.6.0/bin/ruby(vm_call_cfunc+0x156) [0x10b717ec6]
/Users/puritysb/.rbenv/versions/2.6.0/bin/ruby(vm_exec_core+0x31ba) [0x10b6feada]
/Users/puritysb/.rbenv/versions/2.6.0/bin/ruby(rb_vm_exec+0xac4) [0x10b712914]
/Users/puritysb/.rbenv/versions/2.6.0/bin/ruby(ruby_exec_internal+0xe6) [0x10b580d36]
/Users/puritysb/.rbenv/versions/2.6.0/bin/ruby(ruby_run_node+0x49) [0x10b580ba9]
/Users/puritysb/.rbenv/versions/2.6.0/bin/ruby(main+0x5d) [0x10b4eb2ed]

-- Other runtime information -----------------------------------------------

* Loaded script: test.rb

* Loaded features:

    0 enumerator.so
    1 thread.rb
    2 rational.so
    3 complex.so
    4 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/x86_64-darwin18/enc/encdb.bundle
    5 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/x86_64-darwin18/enc/trans/transdb.bundle
    6 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/x86_64-darwin18/rbconfig.rb
    7 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems/compatibility.rb
    8 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems/defaults.rb
    9 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems/deprecate.rb
   10 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems/errors.rb
   11 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems/version.rb
   12 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems/requirement.rb
   13 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems/platform.rb
   14 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems/basic_specification.rb
   15 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems/stub_specification.rb
   16 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/delegate.rb
   17 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/uri/rfc2396_parser.rb
   18 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/uri/rfc3986_parser.rb
   19 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/uri/common.rb
   20 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/uri/generic.rb
   21 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/uri/file.rb
   22 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/uri/ftp.rb
   23 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/uri/http.rb
   24 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/uri/https.rb
   25 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/uri/ldap.rb
   26 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/uri/ldaps.rb
   27 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/uri/mailto.rb
   28 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/uri.rb
   29 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems/specification_policy.rb
   30 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems/util/list.rb
   31 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/x86_64-darwin18/stringio.bundle
   32 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems/specification.rb
   33 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems/exceptions.rb
   34 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems/util.rb
   35 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems/bundler_version_finder.rb
   36 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems/dependency.rb
   37 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems/core_ext/kernel_gem.rb
   38 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/monitor.rb
   39 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems/core_ext/kernel_require.rb
   40 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems/core_ext/kernel_warn.rb
   41 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems.rb
   42 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/2.6.0/rubygems/path_support.rb
   43 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/did_you_mean-1.4.0/lib/did_you_mean/version.rb
   44 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/did_you_mean-1.4.0/lib/did_you_mean/core_ext/name_error.rb
   45 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/did_you_mean-1.4.0/lib/did_you_mean/levenshtein.rb
   46 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/did_you_mean-1.4.0/lib/did_you_mean/jaro_winkler.rb
   47 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/did_you_mean-1.4.0/lib/did_you_mean/spell_checker.rb
   48 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/did_you_mean-1.4.0/lib/did_you_mean/spell_checkers/name_error_checkers/class_name_checker.rb
   49 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/did_you_mean-1.4.0/lib/did_you_mean/spell_checkers/name_error_checkers/variable_name_checker.rb
   50 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/did_you_mean-1.4.0/lib/did_you_mean/spell_checkers/name_error_checkers.rb
   51 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/did_you_mean-1.4.0/lib/did_you_mean/spell_checkers/method_name_checker.rb
   52 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/did_you_mean-1.4.0/lib/did_you_mean/spell_checkers/key_error_checker.rb
   53 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/did_you_mean-1.4.0/lib/did_you_mean/spell_checkers/null_checker.rb
   54 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/did_you_mean-1.4.0/lib/did_you_mean/formatters/plain_formatter.rb
   55 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/did_you_mean-1.4.0/lib/did_you_mean/tree_spell_checker.rb
   56 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/did_you_mean-1.4.0/lib/did_you_mean.rb
   57 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/damerau-levenshtein-1.3.2/lib/damerau-levenshtein/version.rb
   58 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/damerau-levenshtein-1.3.2/lib/damerau-levenshtein/damerau_levenshtein.bundle
   59 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/damerau-levenshtein-1.3.2/lib/damerau-levenshtein/formatter.rb
   60 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/damerau-levenshtein-1.3.2/lib/damerau-levenshtein/differ.rb
   61 /Users/puritysb/.rbenv/versions/2.6.0/lib/ruby/gems/2.6.0/gems/damerau-levenshtein-1.3.2/lib/damerau-levenshtein.rb

[NOTE]
You may have encountered a bug in the Ruby interpreter or extension libraries.
Bug reports are welcome.
For details: https://www.ruby-lang.org/bugreport.html

[IMPORTANT]
Don't forget to include the Crash Report log file under
DiagnosticReports directory in bug reports.

here is test code for reproduce

require 'damerau-levenshtein'
dl = DamerauLevenshtein
text1 = "I am QA"
text2 = "IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII"
edit_distance = dl.distance(text1, text2)

Crash on extra long strings

Hi, I'm not sure if this is still actively maintained, but I figured I would put a report in.

I'm using this gem to help me deduplicate close matches between long strings of text. I enjoy the performance of the library, which is why I've been using it.

Recently, I noticed that when I have two close but not exact strings that are quite long the code was either crashing for getting killed.

The crash was regularly reproducible when comparing two strings of over 50,000 character in length. Shorter lengths were getting killed earlier but not crashing (~30,000 chars).

The stack trace is pointing at line 52 in the c code when it crashes:

 d[i*sl] = i;

... wondering if the strings need a more constrained buffer approach somewhere?

I can give more details if you need.

Thanks,
David

Waged version for qwerty keyboards

Hey guys,

I have a feature request: would it be possible to wage based on the qwerty keys distance from each other? This would be super useful to detect typosquatting.

damerau_levenshtein.so LoadError

I have the message:

~/.gem/ruby/2.7.0/gems/damerau-levenshtein-1.3.3/lib/damerau-levenshtein.rb:4:in `require_relative': libruby.so.2.5: cannot open shared object file: No such file or directory - ~/.gem/ruby/2.7.0/gems/damerau-levenshtein-1.3.3/lib/damerau-levenshtein/damerau_levenshtein.so (LoadError)

System - ArchLinux.
Ruby 2.7.0
Although the file is present in my system
~/.gem/ruby/2.7.0/gems/damerau-levenshtein-1.3.3/lib/damerau-levenshtein/damerau_levenshtein.so

Error calculating damerau-levenshtein distance for large strings

I was just looking at this module and I noticed that for larger strings there appears to be an issue calculating the dl distance. For example:

DamerauLevenshtein.distance("aaliyahmarie", "rvolnwkxcqefqbthwxfrskwchnmqoibgbvosrdyyuswdyqtiez") => 11

This is incorrect as shows on:
http://fuzzy-string.com/Compare/Transform.aspx?r=aaliyahmarie&q=rvolnwkxcqefqbthwxfrskwchnmqoibgbvosrdyyuswdyqtiez

It's supposed to be 44.

Here are some other examples:
DamerauLevenshtein.distance("aaliyahmarie", "mfhdfvbfykinhnhkhcubdjjqgjdcaazjvuijpdodsulveprmmh") => 11 (should be 45)
DamerauLevenshtein.distance("aaliyahmarie", "hxswuwpeahyulhwnjxsdtopawjqbcytjkrszhfonapovxyyazr") => 11 (should be 45)

Here is an example where "aaliyahmarie" doesn't break at long strings:
DamerauLevenshtein.distance("aaliyahmarie", "ukcmvhgxewuliefeinkftyjbqgvymzzzqmlzeuwjjquxvrtjic") => 44 (Correct)

Another example with a different first string:
DamerauLevenshtein.distance("constantine", "qzmwsgylrorhimtpsmbesmzerbkiteyhityhkmimtvysdhuize") => 11 (Should be 44)

Not sure why all of the wrong answers are 11, maybe it is a coincidence from the string size I picked for the second string.

I was able to replicate this bug across multiple machines (Both Mac)

Join forces

I found this excellent gem! ๐Ÿ’Ž

It seems like there are a few ruby gems for string matching:

Would it make sense to join forces? In other words, have several maintainers of one project.

String distance is a great functionality, but the API can be pretty static over time. So what's important is mostly to have several maintainers that can help each other with CI upgrades and similar.

ping @kiyoka @flori @dimus @tonytonyjan

I've opened issues in all three repos quoted above, with the same message

damerau_levenshtein.so (LoadError)

Description

I cant use the gem after installing it on Amazon Linux 2023 - Ruby 3.2.2 .

To Reproduce

Actually i am using the Ruby 3.2.2 Platform provided by Aws ElasticBeanstalk.
But i managed to reproduce it with Amazon Linux docker image.

Here are the steps:

# run the container
docker run -it amazonlinux:2023 bash
# install build dependencies
yum groupinstall -y "Development Tools" && dnf install -y ruby ruby-devel
# install gem
gem install damerau-levenshtein
# run a test 
ruby -rdamerau-levenshtein -e 'puts DamerauLevenshtein.distance("one", "onne")'

/usr/local/share/ruby3.2-gems/gems/damerau-levenshtein-1.3.3/lib/damerau-levenshtein.rb:4:in `require_relative': libruby.so.2.5: cannot open shared object file: No such file or directory - /usr/local/share/ruby3.2-gems/gems/damerau-levenshtein-1.3.3/lib/damerau-levenshtein/damerau_levenshtein.so (LoadError)
	from /usr/local/share/ruby3.2-gems/gems/damerau-levenshtein-1.3.3/lib/damerau-levenshtein.rb:4:in `<top (required)>'
	from <internal:/usr/share/ruby3.2-rubygems/rubygems/core_ext/kernel_require.rb>:159:in `require'
	from <internal:/usr/share/ruby3.2-rubygems/rubygems/core_ext/kernel_require.rb>:159:in `rescue in require'
	from <internal:/usr/share/ruby3.2-rubygems/rubygems/core_ext/kernel_require.rb>:39:in `require'
<internal:/usr/share/ruby3.2-rubygems/rubygems/core_ext/kernel_require.rb>:85:in `require': cannot load such file -- damerau-levenshtein (LoadError)
	from <internal:/usr/share/ruby3.2-rubygems/rubygems/core_ext/kernel_require.rb>:85:in `require'

Expected behavior

I should get the result 1.
The included .so file seems to reference old libruby.so.2.5
Seems something is not working well on the .so file link after extensions compile step.
Because when i head to the ext/ folder and run make command, then copy the generated .so file to src/ folder everything goes well.

Thank you !

Does not handle case where one of the strings is one character long

irb(main):018:0> DamerauLevenshtein.distance('a', 'ab')
=> 0
irb(main):019:0> DamerauLevenshtein.distance('a', 'a')
=> 0
irb(main):020:0> DamerauLevenshtein.distance('a', 'bc')
=> 0
irb(main):021:0> DamerauLevenshtein.distance('bc', 'a')
=> 0
irb(main):022:0> DamerauLevenshtein.distance('bct', 'a')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.