Coder Social home page Coder Social logo

ertyumpx / levenshtein-distance Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 5.22 MB

String comparison and string search optimisation repertoire, written in C++, Python3 and Typescript

License: GNU General Public License v3.0

Makefile 1.77% C++ 6.41% Python 14.35% C 67.19% JavaScript 0.16% TypeScript 10.12%
cpp levenshtein-distance python python3 string-comparison string-search typescript fuzzy-search fuzzysearch search-algorithm

levenshtein-distance's Introduction

Levenshtein Distance

A repertoire for Levenshtein Distance calculation functions to be used in other projects.

Project is still being worked on.

License

This project is licensed under the GNU GPL-3.0 license.

Setup

There is no third-party dependency.

Easily clone the project.

> git clone <repo-url>
> cd levenshtein-distance

Python

The project is written in Python 3.11.6, although should work on any Python Interpreter above 3.5.x.

Main library is python3/distance.py where all the functions are defined.

To run the general tests:

> cd python/
> make test

C++

Project is currently compiled with GNU G++ 13.2.1.

For compiling and linking rules GNU Make 4.4.1 was used.

After fulfilling dependencies, download or clone the project and use Makefile to easily compile:

> cd cpp/ 
> make all
> make run

TypeScript

Project is written in TypeScript 5.3.3 with ES2021 target.

For package management Yarn 1.22.21 is used.

There is no dependancy for library, but for testing Jest 27.4.7 is used.

> cd typescript/
> yarn install

Assuming that you will compile the source here and rather use the code somewhere else, there is no need for a build script.

Still, there are some tests that you can run with Jest:

> yarn test 

Project

Will be written soon...

levenshtein-distance's People

Contributors

erthium avatar

Stargazers

 avatar

Watchers

 avatar

levenshtein-distance's Issues

Simplification On Chars That Does Not Have Accent

Currently used string simplification process is simple:

  • Normalise all character with NFKD, which removes accents from characters and create 2 different chars.
  • Remove all characters that are not in ASCII range.

This process works just fine for almost all cases, but in some situtation it fails, such as the letter ı does not have any accent, but used a lot in Turkish language and clearly corresponds to the letter i in ASCII, but since it does not have accent, it gets lost in the process.

We need to find a way to support such characters.

Adding Boyer-Moore String Search Algortihm to the Repertoire

Since we have started in string search in general, we might as well add the famous Boyer-Moore String-Search algorithm for pattern matching.

It is certain that this algorithm is not completely suitable for fuzzy search since it gets faster as the pattern size increases, but for the sake of the starting purpose of this project, we can give it a look.

Unittests for C++

Currently there are many unit tests for all the source functions for Python and TypeScript, yet there is none for C++

We gotta add some.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.