Coder Social home page Coder Social logo

rahmed31 / spellchecker Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 41 KB

This repository implements a brute-force spellchecker utilizing the Damerau-Levenshtein edit distance.

License: MIT License

Python 100.00%
damerau-levenshtein-distance spellchecker python3 natural-language-processing

spellchecker's Introduction

About spellchecker.py

Implementing an accurate, brute-force, and dynamically programmed spellchecking program that utilizes the Damerau-Levenshtein string metric for measuring edit distance between two sequences of characters.

How to Write Your Own Test Cases

In the test folder, you will see two different text files called candidate_words.txt and incorrect_words.txt:

  • The candidate_words.txt text file can contain an unlimited amount of CORRECTLY spelled words, with each word written on a new line.
  • The incorrect_words.txt text file can contain an unlimited amount of INCORRECTLY spelled words, with each word written on a new line. However, each incorrectly spelled word in this list MUST have its correctly spelled counterpart contained somewhere in the candidate_words.txt text file. It doesn't matter where, since the candidate_words.txt file will be randomly shuffled anyway.

In the test folder, you will see a text file called target_words.txt:

  • The target_words.txt file will contain the CORRECT spelling of each word contained in the incorrect_words.txt text file, with each being on a new line in the same exact order that you inserted their incorrectly spelled counterparts in the incorrect_words.txt text file. It is important that both the incorrectly and correctly spelled words are in the same order to be able to calculate the accuracy of the spell checker.

To view an example on how to create your own test cases, take a look at the files provided in either folder.

How to Run the Program

Clone the repository. Then enter the folder's directory using your terminal. Finally, simply run python3 spellchecker.py

  • The only thing you will need to modify are the files in the test folders if you want to try the program with your own test cases. The program does not need to be touched, unless you'd like to modify the global variable THRESHOLD, which is used as the threshold to find an incorrectly spelled word's closest approximation.
  • The incorrectly spelled words in incorrect_words.txt will be run through the program to find its closest lexical match from the candidate_words.txt text file using the Damerau-Levenshtein algorithm.
  • The spellchecked words will then be, in order, cross checked against its intended counterparts in target_words.txt to calculate the overall accuracy of the spellchecking algorithm.

The results of the program will then be printed to your terminal.

Dependencies

Ensure that you have difflib installed for python3.

Final Words

Feel free to use or modify this program for your intended purposes!

spellchecker's People

Contributors

rahmed31 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.