Coder Social home page Coder Social logo

autocorrect's People

Contributors

foobarmus avatar negm avatar ryanfreckleton avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

autocorrect's Issues

Numbers are corrected as a mikstake.

Hello,
I found a problem by using spellcheck. the Numbers are treated as errors and tried to be corrctede, for example instead of 50 it shows of....

Multiple suggestions and estimations of confidence

The spell() function returns only single value that is the corrected word. Can you please add another function which returns two values (word, confidence) where confidence is percentage score from 0.0 to 1.0 showing how system confident what the provided word was misspelled. Also maybe it is good to provide multiple suggestions and confidence level for each of them, for example,

> spell_scores('pinrt')
[('print', 0.9), ('pint', 0.7)]

Currently autocorrect understands this typo in "pinrt" incorrectly.

Spell doesn't load in Windows

I traced the issue to utils.py -> with closing(t.extractfile(tar_path)) as f:
it seems the tarfile library doesn't behave as intended on Windows. I extracted the file manually and changed the code to open it using os.open for now but I intended to make a pull request with a more general solution soon
Thanks for your effort

download other languages through a proxy

Hello,

I tried using your library on my working machine, which works through a proxy. When downloading a dictionary for the Russian language, the error "Tunnel connection failed: 407 Proxy Authentication Required" appears. I know how to deal with such errors when downloading dictionaries for Numpy or when cloning libraries from github, but I did not find how to do this for your library.

I can probably just download the data for the Russian language separately, but it’s still not clear where to put the file to make it work.

How to extend the library

Hi dear(s).
I want to add some other words and suggestions to this library, please tell me how can I do this?
Also, where are the reference files in source codes such as 'en_US_GB_CA_lower.txt' or 'en_US_GB_CA_mixed.txt'?

Freeze (memory/CPU chewed up) when trying to spell long string

spell('ç§�ã�Ÿã�¡ã�¯ãƒ‰ãƒ¼ãƒ“ルã�«2009å¹´7月ã�«4泊ã�—ã�¾ã�—ã�Ÿã€‚ 地下鉄ã�‹ã‚‰2分ã€�ブãƒ\xadードウェイã�¾ã�§æ\xad©ã�„ã�¦ã‚‚ã€�ã��ã‚“ã�ªã�«æ°—ã�«ã�ªã‚Šã�¾ã�›ã‚“ã�§ã�—ã�Ÿã�‹ã‚‰ã€�立地ã�§ã‚‚ã�™ã�¦ã��ã�ªãƒ›ãƒ†ãƒ«ã�§ã�™ã€‚経営者ã�‹ã�¨æ€�ã‚�れるè€�夫婦ã�¨å¨˜ã�•ã‚“ã€�ã�‚ã�¨2人ã�®ã‚¹ã‚¿ãƒƒãƒ•ã�«å‡ºä¼šã�„ã�¾ã�—ã�Ÿã€‚ ゴスペルã�®æ‰€åœ¨åœ°ã‚’å°‹ã�\xadã�Ÿã‚‰ã€�ãƒ�ットã�§åœ°å›³ã‚’プリントã�—ã�¦ã��ã‚Œã�¦ã€�親切ã�«èª¬æ˜Žã�—ã�¦ã��ã‚Œã�¾ã�—ã�Ÿã€‚ æ\xad´å�²ã‚’ä¿�ã�¨ã�†ã�¨ã�¨ã�—ã�¦ã�„るニューヨークをæ\xad©ã��æ‹\xa0点ã�¨ã�—ã�¦ã€�最é�©ã�ªãƒ›ãƒ†ãƒ«ã�§ã�™ã€‚ スターãƒ�ックスã€�マクドナルドã€�ã‚\xadングãƒ�ーガも近ã��ã�«ã�‚ã‚Šã€�エンパイアステートビル迄10分弱ã�§ã�™ã�Œã€�コリアã�®çµŒå–¶ã�™ã‚‹ã‚³ãƒ³ãƒ“ニ兼飲食店も数軒有りã€�ホテルã�®è£�手ã�®é€šã‚Šã�«ã�¯æ¶ˆè²»ç¨Žç„¡æ–™ã�®ã‚³ãƒ³ãƒ“ニもã�‚ã�£ã�¦ä¾¿åˆ©ã�§ã�—ã�Ÿã€‚ 100å¹´ã�®æ\xad´å�²ã�¨ã�„ã�£ã�¦ã‚‚改装ã�•ã‚Œã�¦ã�„ã�¦æ¸…æ½”ã�ªãƒ›ãƒ†ãƒ«ã�§ã�™ã€‚手動ã�®ã‚¨ãƒ¬ãƒ™ãƒ¼ã‚¿ã‚‚å�°è±¡ã�«æ®‹ã‚Šã�¾ã�—ã�Ÿã€‚ 冷蔵庫ã�Œã�ªã�„点ã�¨ã€�ウインドウタイプã�®ã‚¨ã‚¢ã‚³ãƒ³ã�Œã�¡ã‚‡ã�£ã�¨ä¸�便ã�ªç‚¹ä»¥å¤–ã�Šå‹§ã‚�ã�§ã�™ã€‚')

This causes memory to chew up to 6GB+ in a matter of seconds.

Took me all day to figure this out!

Would be good to include some sort of blacklist of weird characters and prevent the mysterious memory hog, eg throw an error (and provide some api to check if a string is spellable)

Speller() cannot be created since the tarf.extractfile

I install the latest version, i.e., 2.3.0 on Unbuntu. When I run the code:
spell = Speller()
It reports the error as follows:

self.nlp_data = load_from_tar(lang) if nlp_data is None else nlp_data

File "/home/dgl/virtual_env/textR/lib/python3.5/site-packages/autocorrect/init.py", line 78, in load_from_tar
return json.load(file)
File "/usr/lib/python3.5/json/init.py", line 268, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/lib/python3.5/json/init.py", line 312, in loads
s.class.name))
TypeError: the JSON object must be str, not 'bytes'

Autocorrect for other languages

Hello,

I've used this python library in my project, but it seems that only works with the English dictionary. Or does it work for other languages?
Because I would like to use to auto correct words in Portuguese.

Thanks,
Rita

Too slow to be useful

It takes 0.25 second (!) to correct a single word (“paiin”). If this stems from the code/algorithm design, I suggest to describe the package as a proof-of-concept or toy, not a spelling corrector.

Time out for function

Given a list of words, I'm looping trough them to correct any misspelled words. For now I'm stuck in the 5738 word for more than five minutes, with memory usage up to 12GB of RAM and disk usage of 120MB/s.
It would be nice to have a time out parameter to abort if the search is taking to long. It probably possible to optimize the memory usage either.

Too slow..

It takes 6+ seconds to give me the corrected word.

print(spell('wednsday'));
wednesday
[Finished in 6.5s]

print(spell('hello'));
hello
[Finished in 6.6s]

I would love to contribute to this project. Please tell me your preferred method of contact. Till then, I will go through the code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.