Coder Social home page Coder Social logo

isherep / delete-only-levenshtein Goto Github PK

View Code? Open in Web Editor NEW
1.0 0.0 1.0 760 KB

Created modified Levenshtein distance algorithms, to match strings by deletion and capitalization only and does not allow replacement or insertion of characters

Java 100.00%
levenshtein-distance levenshtein-algorithm edit-distance-algorithm levenshtein-deletion-only string-matching modified-levenshtein weighted-levenshtein duplicates-removed character-duplication delete-only-levenshtein

delete-only-levenshtein's Introduction

Modified-Levenshtein-Distance-Delete-Only

Modified Levenshtein distance algorithms, to match strings by deletion and capitalization only and does not allow replacement or insertion of characters

After working on the spellchecking application, I learned that Levenhstein distance does not return the correct word in many cases. For example "applllleee" will return "appelle" instead of "apple" If the word contains multiple invalid duplications of repeated characters, the Levenshtein distance algorithm will return the closest matching word, but it might be not the correct word we are looking for. It will return the word with replaced or inserted characters instead of just removed duplicated ones. Also the same happens when the word has one miscapitalized letter, if the cost of inserting different or replacing it is the smallest, it will replace it with a whole different character or delete it.

For DeleteOnlyLevenshtein method deleteOnlyLevenshtein

For Delete And Replace Only Wrong Casing Levenshtein method deleteAndReplaceWrongCasing

I modified the Levenshtein algorithm to handle these problems.

The idea is the following: Edit Distance algorithm uses the smallest edit distance between two strings to perform correction. If we want to allow the only deletion - then we have to make sure it is always the minimum of all choices Assign high values to other costs so they will never be selected

  1. Compare two strings: matching if only the case mismatched character casing needs to be replaced

  2. Compute cost for each transformation, but instead of using real costs of insertion and replacement - assigning higher costs to insertion and substitution, unless it's replacing of miscapitalized character. If characters have only case mismatch - allow the original cost of replacement to the correct casing.

  3. If the characters are not duplicated - assign a high value to deletion so it's not selected.

If you found this code helpful, please give a star โญ

delete-only-levenshtein's People

Contributors

isherep avatar

Stargazers

 avatar

Forkers

jeremyhilado

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.